CC 2025
Sat 1 - Sun 2 March 2025
Sat 1 Mar 2025 17:30 - 18:00 at Acacia A - Binary Analysis and Hardware I Chair(s): Sara Achour

Faults within CPU circuits, which generate incorrect results and thus silent data corruption, have become endemic at scale. The only generic techniques to detect one-time or intermittent soft errors, such as particle strikes or voltage spikes, require redundant execution, where copies of each instruction in a program are executed twice and compared.

The only software solution for this task that is open source and available for use today is nZDC, which aims to achieve ``near-zero silent data corruption'' through control- and data-flow redundancy. However, when we tried to apply this to large-scale workloads, we found it suffered a wide set of false positives, negatives, compiler bugs and run-time crashes, which meant it was impossible to benchmark against. This document details the wide set of fixes and workarounds we had to put in place to make nZDC work across full suites. We provide many new insights as to the edge cases that make such instruction duplication tricky under complex ISAs such as Aarch64 and their similarly complex ABIs. Evaluation across SPECint 2006 and Parsec with our extensions takes us from no workloads executing to all bar four, with 2x and 1.6x geomean overhead respectively relative to execution with no fault tolerance.

Sat 1 Mar

Displayed time zone: Pacific Time (US & Canada) change

16:00 - 18:00
Binary Analysis and Hardware IMain Conference at Acacia A
Chair(s): Sara Achour Stanford University
16:00
30m
Talk
A Comparative Study on the Accuracy and the Speed of Static and Dynamic Program Classifiers
Main Conference
Anderson Faustino da Silva State University of Maringá, Jeronimo Castrillon TU Dresden, Germany, Fernando Magno Quintão Pereira Federal University of Minas Gerais
16:30
30m
Talk
Biotite: A High-Performance Static Binary Translator using Source-Level Information
Main Conference
Changbin Chen The University of Tokyo, Shu Sugita University of Tokyo, Yotaro Nada The University of Tokyo, Hidetsugu Irie University of Tokyo, Shuichi Sakai University of Tokyo, Ryota Shioya University of Tokyo
17:00
30m
Talk
Post-Link Outlining for Code Size Reduction
Main Conference
shaobai yuan Hunan University, Jihong He Hunan University, Yihui Xie Hunan University, Feng Wang Hunan University, Jie Zhao Hunan University
17:30
30m
Talk
A Deep Technical Review of nZDC Fault Tolerance
Main Conference
Minli Liao University of Cambridge, Sam Ainsworth University of Edinburgh, Lev Mukhanov Queen Mary University London, Timothy M. Jones University of Cambridge
Pre-print Media Attached