ICSE 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada

Code obfuscation is a well-known method for protecting proprietary software against reverse engineering. While obfuscation is beneficial, it alters the program’s control and data flow. Thus, systematic control and data flow alteration significantly affect the binaries’ quality, especially from the viewpoint of fuzzing testing. Although these modifications can introduce bugs in the program, there is still a lack of clear understanding of how implementing an obfuscation algorithm degrades the quality of fuzzing obfuscated software. This paper proposes ObfFuzz, the first empirical study to reveal and understand the challenges behind fuzzing the obfuscated software. Our study begins by fuzzing unobfuscated binaries and comparing the results with obfuscated binaries. We chose the most popular fuzzing technique, AFL++. It is interesting because of its approach to combining mutation-based test case generation with code coverage metrics, which is the most affected component, by applying obfuscation transformation techniques. We have evaluated our tool on four real-world programs (pdfinfo, exif, tiffinfo, and md2roff). 1) Our study revealed several findings: 1) Control flow obfuscation makes the code more complex, significantly decreasing code coverage percentage by 60% compared to the original code. 2) Due to data flow obfuscation, we have inefficient mutations during fuzzing, so it takes 70% more time to expose crashes than in the unobfuscated code with the same constraints. 3) Applying obfuscation techniques increases program code size and decreases execution speed, giving us inefficient fuzzing results for the obfuscated binary. 4) Our observation shows that fuzzing an obfuscated binary is inefficient. Obfuscation transformations add more complexity, degrading the resultant binary and introducing buggy code