Faults within CPU circuits, which generate incorrect results and thus silent data corruption, have become endemic at scale. The only generic techniques to detect one-time or intermittent soft errors, such as particle strikes or voltage spikes, require redundant execution, where copies of each instruction in a program are executed twice and compared.
The only software solution for this task that is open source and available for use today is nZDC, which aims to achieve ``near-zero silent data corruption'' through control- and data-flow redundancy. However, when we tried to apply this to large-scale workloads, we found it suffered a wide set of false positives, negatives, compiler bugs and run-time crashes, which meant it was impossible to benchmark against. This document details the wide set of fixes and workarounds we had to put in place to make nZDC work across full suites. We provide many new insights as to the edge cases that make such instruction duplication tricky under complex ISAs such as Aarch64 and their similarly complex ABIs. Evaluation across SPECint 2006 and Parsec with our extensions takes us from no workloads executing to all bar four, with 2x and 1.6x geomean overhead respectively relative to execution with no fault tolerance.