Pretraining on Call Graphs: When Binary Analysis Tasks Profit From Context
This program is tentative and subject to change.
In the area of binary code analysis, there recently has been a rapid rise in research on binary function embedding models. These are trained to encode the semantics of binary code in such a way that they can be generalized to a variety of reverse engineering tasks such as binary code search, vulnerability detection, or malware classification. While many models only take the function in question as contextual input, there have been successful attempts to improve function embeddings by leveraging information from the call graph. In this study, we dissect the implications of these embedding refinements. We conduct experiments using a range of graph-based models on the embeddings generated by two state-of-the-art binary function embedding models. In the process, we show that improvements on binary code similarity detection (BCSD) will not necessarily generalize to downstream tasks, neither of semantic nor of syntactic nature. More generally, we find that training on semantic tasks correlates with worse performance on syntactic tasks. By conducting an explanatory analysis on the dataset, we find that the call graph-based enhancements significantly enhance the robustness of embeddings, particularly in scenarios where the initial models struggle. We encourage future research to build upon these findings to further explore the best methods for leveraging inter-function information in binary analysis.
This program is tentative and subject to change.
Sun 12 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | Session 1 - Code AnalysisResearch Track / ICPC Program / Early Research Achievements (ERA) at Europa II Chair(s): Igor Wiese Federal University of Technology | ||
11:00 10mTalk | Pretraining on Call Graphs: When Binary Analysis Tasks Profit From Context Research Track Pre-print Media Attached | ||
11:10 10mTalk | LuaReSym: Recovering Variables Liveness Range in Stripped Lua Bytecode via Multi-Stage Static Analysis Research Track Weilong Li School of Computer Science and Engineering,Sun Yat-sen University, Ruizhi Xiao School of Computer Science and Engineering,Sun Yat-sen University, Yabo Wang School of Computer Science and Engineering,Sun Yat-sen University, Jiakun Sun School of Computer Science and Engineering,Sun Yat-sen University, Yuqing Shao School of Information Science and Engineering, East China University of Science and Technology, Shuyuan Jin School of Computer Science and Engineering,Sun Yat-sen University | ||
11:20 10mTalk | Modubin: A Binary Modularization Approach Based on the Locality of Homologous Functions Research Track Wenyan Yu Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences, Lei Cui Zhongguancun Laboratory, Jiayuan Li Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences, liyubo Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences, Hong Li Institute of Information Engineering at Chinese Academy of Sciences, Kai Cheng Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences, Hongsong Zhu Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences DOI Media Attached | ||
11:30 10mTalk | RlDecompiler: Enhancing LLM-based Decompilation via Reinforcement Learning with a Multi-Faceted Reward Function Research Track Yuchi Su University of Electronic Science and Technology of China, Weina Niu University of Electronic Science and Technology of China, Jiacheng Gong University of Electronic Science and Technology of China, Ran Yan University of Electronic Science and Technology of China, Song Li The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Xin Liu Lanzhou University, Xiaosong Zhang University of Electronic Science and Technology of China | ||
11:40 10mTalk | A Multi-Agent Framework for Automated Exploit Generation with Constraint-Guided Comprehension and Reflection Research Track Siyi Chen Alibaba Group, Tianhan Luo Alibaba Group, Shijian Wu Alibaba Group, Xiangyu Liu Alibaba Group, Yilin Zhou Wuhan University, Qi Li Alibaba Group, Wenyuan Xu Aarhus University Pre-print | ||
11:50 10mTalk | Typify: A Lightweight Usage-driven Static Analyzer for Precise Python Type Inference Research Track Ali Aman University of Windsor, Muhammad Asaduzzaman University of Windsor, Shaowei Wang University of Manitoba Pre-print | ||
12:00 10mTalk | To GOTO or Not to GOTO: Measuring Structural Complexity of (Decompiled) Code Research Track | ||
12:10 5mTalk | Understanding Type Hints in Python Libraries and Frameworks: Early Insights Early Research Achievements (ERA) | ||
12:15 10mLive Q&A | Joint QA and Discussion ICPC Program | ||