ASE 2025
Sun 16 - Thu 20 November 2025 Seoul, South Korea

Binary reverse engineering is foundational to various tasks such as malware analysis and vulnerability detection. Traditional binary analysis tools mainly operate at the function level. However, modern software has grown significantly in size, with binaries often containing thousands of functions. Without understanding how these functions are organized into higher-level structures, it becomes difficult to effectively support downstream analysis tasks. Analysts must examine thousands of functions separately, making the process time-consuming and error-prone. Despite these challenges, current research on recovering the higher-level structure of binaries remains limited.

To bridge this gap, we propose BinStruct, a novel binary structure recovery framework that recovers both file and module structures from binaries. BinStruct first identifies the file structure by combining data reference patterns, function calls, and semantic understanding from Large Language Models. Then, inspired by software architecture recovery in source code analysis, BinStruct identifies modules by clustering the recovered files using consensus between structural dependency and semantic similarity. Evaluation on 121 real-world stripped binaries demonstrates that BinStruct outperforms state-of-the-art techniques in both file and module recovery accuracy, while requiring only 7.42s and 34.46s on average to recover file and module structures, respectively. Case studies on Libxml2 and PredatorTheStealer demonstrate BinStruct’s effectiveness on security tasks like attack surface analysis and malware investigation.