KEENHash: Hashing Programs into Function-aware Embeddings for Large-scale Binary Code Similarity Analysis
Binary code similarity analysis (BCSA) is a crucial research area in many fields such as cybersecurity. Specifically, function-level diffing tools are the most widely used in BCSA: they perform (similar) function matching one by one for evaluating the similarity between binary programs (binaries). However, such methods need a high time complexity, making it unscalable in large-scale scenarios (e.g., 1/n-to-n searching). Towards effective and efficient program-level BCSA, we propose KEENHash, a novel hashing approach that hashes binaries into program-level representations through large language model (LLM)-generated function embeddings. KEENHash condenses a binary into one compact and fixed-length program embedding using K-Means and Feature Hashing, allowing us to do effective and efficient large-scale program-level BCSA, surpassing the previous state-of-the-art methods. The experimental results show that KEENHash is 215 times faster than the state-of-the-art function matching tool while maintaining effectiveness. Furthermore, in a large-scale scenario with 5.3 billion similarity evaluations, KEENHash takes only 395.83 seconds while the tool will cost 56 days. We also evaluate KEENHash on the program clone search of large-scale BCSA across extensive datasets in 202,305 binaries. Compared with 4 state-of-the-art methods, KEENHash outperforms all of them by at least 23.16%, and displays remarkable superiority over them in the large-scale BCSA security scenario of malware detection.
Fri 27 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
11:00 - 12:15 | |||
11:00 25mTalk | KEENHash: Hashing Programs into Function-aware Embeddings for Large-scale Binary Code Similarity Analysis Research Papers Zhijie Liu ShanghaiTech University, China, Qiyi Tang Tencent Security Keen Lab, Sen Nie Tencent Security Keen Lab, Shi Wu Tencent Security Keen Lab, Liangfeng Zhang School of Information Science and Technology, ShanghaiTech University, Yutian Tang University of Glasgow, United Kingdom DOI | ||
11:25 25mTalk | Porting Software Libraries to OpenHarmony: Transitioning from TypeScript or JavaScript to ArkTS Research Papers Bo Zhou Northeastern University, Jiaqi Shi Northeastern University, Ying Wang Northeastern University, Li Li Beihang University, Li Tsz On The Hong Kong University of Science and Technology, Hai Yu Northeastern University, China, Zhiliang Zhu Northeastern University, China DOI | ||
11:50 25mTalk | STRUT: Structured Seed Case Guided Unit Test Generation for C Programs using LLMs Research Papers Jinwei Liu Xidian University, Chao Li Beijing Institute of Control Engineering; Beijing Sunwise Information Technology, Rui Chen Beijing Institute of Control Engineering; Beijing Sunwise Information Technology, Shaofeng Li Xidian University, Bin Gu Beijing Institute of Control Engineering, Mengfei Yang China Academy of Space Technology DOI |
Cosmos 3C is the third room in the Cosmos 3 wing.
When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.