How Far We Have Come: Testing Decompilation Correctness of C Decompilers
A C decompiler converts an executable (the output from a C compiler) into source code. The recovered C source code, once re-compiled, will produce an executable with the same functionality as the original executable. With over twenty years of development, C decompilers have been widely used in production to support reverse engineering applications, including legacy software migration, security retrofitting, software comprehension, and to act as the first step in launching adversarial software exploitations. As the paramount component and the trust base in numerous cybersecurity tasks, C decompilers have enabled the analysis of malware, ransomware, and promoted cybersecurity professionals’ understanding of vulnerabilities in real-world systems. In contrast to this flourishing market, one observation is that in academia, the final-stage outputs of C decompilers (i.e., recovered C source code) are still not extensively used. Instead, the intermediate results and representations are often more desired for usage when developing applications such as binary security retrofitting. We acknowledge that such conservative approaches in academia are a result of widespread and pessimistic views on the decompilation correctness. However, in conventional software engineering and security research, how much of a problem is, for instance, reusing a piece of simple legacy code by taking the output of modern C decompilers? In this work, we test decompilation correctness to present an up-to-date understanding regarding modern C decompilers. We detected a total of 1,423 inputs that can trigger decompilation errors from four popular decompilers, and with extensive manual effort, we identified 13 bugs in two open-source decompilers. Our findings show that the overly pessimistic view of decompilation correctness leads researchers to underestimate the potential of modern decompilers; the state-of-the-art decompilers certainly care about the functional correctness, and they are making promising progress in revamping the products. However, some tasks that have been studied for years in academia, such as type recovery and optimization, still impede de facto C decompilers from generating accurate and presentable outputs more than is reflected in the literature; these issues rarely receive enough attention and can lead to great confusion that misleads users.
Wed 22 JulDisplayed time zone: Tijuana, Baja California change
13:30 - 14:30 | BUILD TESTINGTechnical Papers at Zoom Chair(s): Nazareno Aguirre Dept. of Computer Science FCEFQyN, University of Rio Cuarto Public Live Stream/Recording. Registered participants should join via the Zoom link distributed in Slack. | ||
13:30 20mTalk | Scalable Build Service System with Smart Scheduling Service Technical Papers DOI Media Attached | ||
13:50 20mTalk | Escaping Dependency Hell: Finding Build Dependency Errors with the Unified Dependency Graph Technical Papers Gang Fan Hong Kong University of Science and Technology, Chengpeng Wang The Hong Kong University of Science and Technology, Rongxin Wu Department of Cyber Space Security, Xiamen University, Xiao Xiao Sourcebrella Inc., Qingkai Shi The Hong Kong University of Science and Technology, Charles Zhang The Hong Kong University of Science and Technology DOI Media Attached | ||
14:10 20mTalk | How Far We Have Come: Testing Decompilation Correctness of C Decompilers Technical Papers DOI Media Attached |