ICSE 2024
Fri 12 - Sun 21 April 2024 Lisbon, Portugal
Wed 17 Apr 2024 16:15 - 16:30 at Amália Rodrigues - Program binaries - evolvability Chair(s): Auri Vincenzi

While third-party libraries (TPLs) are extensively reused to enhance productivity during software development, they can also introduce potential security risks such as vulnerability propagation. Software composition analysis (SCA), proposed to identify reused TPLs for reducing such risks, has become an essential procedure within modern DevSecOps. As one of the mainstream SCA techniques, binary-to-source SCA identifies the third-party source projects contained in binary files via binary source code matching, which is a major challenge in reverse engineering since binary and source code exhibit substantial disparities after compilation. The existing binary-to-source SCA techniques leverage basic syntactic features that suffer from redundancy and lack of robustness in the large-scale TPL dataset, leading to inevitable false positives and compromised recall. To mitigate these limitations, we introduce BinaryAI, a novel binary-to-source SCA technique with two-phase binary source code matching to capture both syntactic and semantic code features. First, BinaryAI employs a transformer-based model to produce function-level embeddings and obtain similar source functions for each binary function accordingly. Then by applying the link-time locality to facilitate function matching, BinaryAI detects the reused TPLs based on the ratio of matched source functions. Our experimental results demonstrate the superior performance of BinaryAI in terms of binary source code matching and the downstream SCA task. Specifically, our embedding model outperforms the state-of-the-art model CodeCMR, achieving 22.73% recall@1 and 70.45% recall@100 in contrast to 11.92% and 44.89% of CodeCMR respectively. Additionally, BinaryAI outperforms all existing binary-to-source SCA tools in TPL detection, increasing the precision from 73.36% to 85.84% and recall from 59.81% to 64.98% compared with the well-recognized commercial SCA product Black Duck.

Wed 17 Apr

Displayed time zone: Lisbon change

16:00 - 17:30
Program binaries - evolvabilityResearch Track / Software Engineering in Practice / Demonstrations at Amália Rodrigues
Chair(s): Auri Vincenzi Federal University of São Carlos
16:00
15m
Talk
Cross-Inlining Binary Function Similarity Detection
Research Track
Ang Jia Xi'an Jiaotong University, Ming Fan Xi'an Jiaotong University, Xi Xu Xi'an Jiaotong University, Wuxia Jin Xi'an Jiaotong University, Haijun Wang Xi'an Jiaotong University, Ting Liu Xi'an Jiaotong University
DOI Pre-print
16:15
15m
Talk
BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching
Research Track
Ling Jiang Southern University of Science and Technology, Junwen An Southern University of Science and Technology, Huihui Huang Southern University of Science and Technology, Qiyi Tang Tencent Security Keen Lab, Sen Nie Tencent Security Keen Lab, Shi Wu Tencent Security Keen Lab, Yuqun Zhang Southern University of Science and Technology
16:30
15m
Talk
PPT4J: Patch Presence Test for Java Binaries
Research Track
Zhiyuan Pan Zhejiang University, Xing Hu Zhejiang University, Xin Xia Huawei Technologies, Xian Zhan Southern University of Science and Technology, David Lo Singapore Management University, Xiaohu Yang Zhejiang University
16:45
15m
Talk
Code Impact Beyond Disciplinary Boundaries: Constructing A Multidisciplinary Dependency Graph and Analyzing Cross-Boundary Impact
Software Engineering in Practice
Gengyi Sun University of Waterloo, Mehran Meidani University of Waterloo, Sarra Habchi Ubisoft Montréal, Mathieu Nayrolles Ubisoft Montreal, Shane McIntosh University of Waterloo
Pre-print
17:00
7m
Talk
The Devil Is in the Command Line: Associating the Compiler Flags With the Binary and Build Metadata
Software Engineering in Practice
Gunnar Kudrjavets Amazon Web Services, USA, Aditya Kumar Google, Jeff Thomas Meta Platforms, Inc., Ayushi Rastogi University of Groningen, The Netherlands
DOI Pre-print
17:07
7m
Talk
Verifying and Displaying Move Smart Contract Source Code for the Sui Blockchain
Demonstrations
Rijnard van Tonder Mysten Labs, Inc.