How Far We Have Come: Testing Decompilation Correctness of C Decompilers (ISSTA 2020 - Technical Papers)

Who

Zhibo Liu, Shuai Wang

Track

ISSTA 2020 Technical Papers

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 22 Jul 2020 14:10 - 14:30 at Zoom - BUILD TESTING Chair(s): Nazareno Aguirre

Abstract

A C decompiler converts an executable (the output from a C compiler) into source code. The recovered C source code, once re-compiled, will produce an executable with the same functionality as the original executable. With over twenty years of development, C decompilers have been widely used in production to support reverse engineering applications, including legacy software migration, security retrofitting, software comprehension, and to act as the first step in launching adversarial software exploitations. As the paramount component and the trust base in numerous cybersecurity tasks, C decompilers have enabled the analysis of malware, ransomware, and promoted cybersecurity professionals’ understanding of vulnerabilities in real-world systems. In contrast to this flourishing market, one observation is that in academia, the final-stage outputs of C decompilers (i.e., recovered C source code) are still not extensively used. Instead, the intermediate results and representations are often more desired for usage when developing applications such as binary security retrofitting. We acknowledge that such conservative approaches in academia are a result of widespread and pessimistic views on the decompilation correctness. However, in conventional software engineering and security research, how much of a problem is, for instance, reusing a piece of simple legacy code by taking the output of modern C decompilers? In this work, we test decompilation correctness to present an up-to-date understanding regarding modern C decompilers. We detected a total of 1,423 inputs that can trigger decompilation errors from four popular decompilers, and with extensive manual effort, we identified 13 bugs in two open-source decompilers. Our findings show that the overly pessimistic view of decompilation correctness leads researchers to underestimate the potential of modern decompilers; the state-of-the-art decompilers certainly care about the functional correctness, and they are making promising progress in revamping the products. However, some tasks that have been studied for years in academia, such as type recovery and optimization, still impede de facto C decompilers from generating accurate and presentable outputs more than is reflected in the literature; these issues rarely receive enough attention and can lead to great confusion that misleads users.

DOI

https://doi.org/10.1145/3395363.3397370

Zhibo Liu

Shuai Wang

Hong Kong University of Science and Technology

Video