BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching
While third-party libraries (TPLs) are extensively reused to enhance productivity during software development, they can also introduce potential security risks such as vulnerability propagation. Software composition analysis (SCA), proposed to identify reused TPLs for reducing such risks, has become an essential procedure within modern DevSecOps. As one of the mainstream SCA techniques, binary-to-source SCA identifies the third-party source projects contained in binary files via binary source code matching, which is a major challenge in reverse engineering since binary and source code exhibit substantial disparities after compilation. The existing binary-to-source SCA techniques leverage basic syntactic features that suffer from redundancy and lack of robustness in the large-scale TPL dataset, leading to inevitable false positives and compromised recall. To mitigate these limitations, we introduce BinaryAI, a novel binary-to-source SCA technique with two-phase binary source code matching to capture both syntactic and semantic code features. First, BinaryAI employs a transformer-based model to produce function-level embeddings and obtain similar source functions for each binary function accordingly. Then by applying the link-time locality to facilitate function matching, BinaryAI detects the reused TPLs based on the ratio of matched source functions. Our experimental results demonstrate the superior performance of BinaryAI in terms of binary source code matching and the downstream SCA task. Specifically, our embedding model outperforms the state-of-the-art model CodeCMR, achieving 22.73% recall@1 and 70.45% recall@100 in contrast to 11.92% and 44.89% of CodeCMR respectively. Additionally, BinaryAI outperforms all existing binary-to-source SCA tools in TPL detection, increasing the precision from 73.36% to 85.84% and recall from 59.81% to 64.98% compared with the well-recognized commercial SCA product Black Duck.
Wed 17 AprDisplayed time zone: Lisbon change
16:00 - 17:30 | Program binaries - evolvabilityResearch Track / Software Engineering in Practice / Demonstrations at Amália Rodrigues Chair(s): Auri Vincenzi Federal University of São Carlos | ||
16:00 15mTalk | Cross-Inlining Binary Function Similarity Detection Research Track Ang Jia Xi'an Jiaotong University, Ming Fan Xi'an Jiaotong University, Xi Xu Xi'an Jiaotong University, Wuxia Jin Xi'an Jiaotong University, Haijun Wang Xi'an Jiaotong University, Ting Liu Xi'an Jiaotong University DOI Pre-print | ||
16:15 15mTalk | BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching Research Track Ling Jiang Southern University of Science and Technology, Junwen An Southern University of Science and Technology, Huihui Huang Southern University of Science and Technology, Qiyi Tang Tencent Security Keen Lab, Sen Nie Tencent Security Keen Lab, Shi Wu Tencent Security Keen Lab, Yuqun Zhang Southern University of Science and Technology | ||
16:30 15mTalk | PPT4J: Patch Presence Test for Java Binaries Research Track Zhiyuan Pan Zhejiang University, Xing Hu Zhejiang University, Xin Xia Huawei Technologies, Xian Zhan Southern University of Science and Technology, David Lo Singapore Management University, Xiaohu Yang Zhejiang University | ||
16:45 15mTalk | Code Impact Beyond Disciplinary Boundaries: Constructing A Multidisciplinary Dependency Graph and Analyzing Cross-Boundary Impact Software Engineering in Practice Gengyi Sun University of Waterloo, Mehran Meidani University of Waterloo, Sarra Habchi Ubisoft Montréal, Mathieu Nayrolles Ubisoft Montreal, Shane McIntosh University of Waterloo Pre-print | ||
17:00 7mTalk | The Devil Is in the Command Line: Associating the Compiler Flags With the Binary and Build Metadata Software Engineering in Practice Gunnar Kudrjavets Amazon Web Services, USA, Aditya Kumar Google, Jeff Thomas Meta Platforms, Inc., Ayushi Rastogi University of Groningen, The Netherlands DOI Pre-print | ||
17:07 7mTalk | Verifying and Displaying Move Smart Contract Source Code for the Sui Blockchain Demonstrations Rijnard van Tonder Mysten Labs, Inc. |