Write a Blog >>
ICSE 2023
Sun 14 - Sat 20 May 2023 Melbourne, Australia
Wed 17 May 2023 15:07 - 15:15 at Level G - Plenary Room 1 - Code smells and clones Chair(s): Sigrid Eldh

Binary similarity analysis is critical to many code-reuse-related issues, where function matching is its fundamental task. “1-to-1” mechanism has been applied in most binary similarity analysis works, in which one function in a binary file is matched against one function in a source file or binary file. However, we discover that the function mapping is a more complex problem of “1-to-n” (one binary function matches multiple source functions or binary functions) or even “n-to-n” (multiple binary functions match multiple binary functions) due to the existence of function inlining, different from traditional understanding.

In this paper, we investigate the effect of function inlining on binary similarity analysis. We carry out three studies to investigate the extent of function inlining, the performance of existing works under function inlining, and the effectiveness of existing inlining-simulation strategies. Firstly, a scalable and lightweight identification method is designed to recover function inlining in binaries. 88 projects (compiled in 288 versions and resulting in 32,460,156 binary functions) are collected and analyzed to construct 4 inlining-oriented datasets for 4 security tasks in the software supply chain, including code search, OSS (Open Source Software) reuse detection, vulnerability detection, and patch presence test. Datasets reveal that the proportion of function inlining ranges from 30%-40% when using O3 and sometimes can reach nearly 70%. Then, we evaluate 4 existing works on our dataset. Results show most existing works neglect inlining and use the “1-to-1” mechanism. The mismatches cause a 30% loss in performance during code search and a 40% loss during vulnerability detection. Moreover, most inlined functions would be ignored during OSS reuse detection and patch presence test, thus leaving these functions risky. Finally, we analyze 2 inlining-simulation strategies on our dataset. It is shown that they miss nearly 40% of the inlined functions, and there is still a large space for promotion. By precisely recovering when function inlining happens, we discover that inlining is usually cumulative when optimization increases. Thus, conditional inlining and incremental inlining are recommended to design a low-cost and high-coverage inlining-simulation strategy.

Wed 17 May

Displayed time zone: Hobart change

13:45 - 15:15
Code smells and clonesTechnical Track / Journal-First Papers / SEIP - Software Engineering in Practice at Level G - Plenary Room 1
Chair(s): Sigrid Eldh Ericsson AB, Mälardalen University, Carleton Unviersity
13:45
15m
Talk
Comparison and Evaluation of Clone Detection Techniques with Different Code Representations
Technical Track
Yuekun Wang University of Science and Technology of China, Yuhang Ye University of Science and Technology of China, Yueming Wu Nanyang Technological University, Weiwei Zhang University of Science and Technology of China, Yinxing Xue University of Science and Technology of China, Yang Liu Nanyang Technological University
14:00
15m
Talk
Learning Graph-based Code Representations for Source-level Functional Similarity Detection
Technical Track
Jiahao Liu National University of Singapore, Jun Zeng National University of Singapore, Xiang Wang University of Science and Technology of China, Zhenkai Liang National University of Singapore
14:15
15m
Talk
The Smelly Eight: An Empirical Study on the Prevalence of Code Smells in Quantum Computing
Technical Track
Qihong Chen University of California, Irvine, Rúben Câmara LASIGE and Department of Informatics are Faculdade Ciências Universidade de Lisboa,, José Campos University of Porto, Portugal, André Souto LaSiGE & FCUL, University of Lisbon, Iftekhar Ahmed University of California at Irvine
Pre-print
14:30
15m
Talk
An Empirical Comparison on the Results of Different Clone Detection Setups for C-based Projects
SEIP - Software Engineering in Practice
Yan Zhou Huawei, Jinfu Chen Centre for Software Excellence, Huawei, Canada, Yong Shi Huawei Technologies, Boyuan Chen Centre for Software Excellence, Huawei Canada, Zhen Ming (Jack) Jiang York University
14:45
7m
Talk
Developers’ perception matters: machine learning to detect developer-sensitive smells
Journal-First Papers
Daniel Oliveira PUC-Rio, Wesley Assunção Johannes Kepler University Linz, Austria & Pontifical Catholic University of Rio de Janeiro, Brazil, Alessandro Garcia PUC-Rio, Baldoino Fonseca Federal University of Alagoas (UFAL), Márcio Ribeiro Federal University of Alagoas, Brazil
14:52
7m
Talk
Smells in system user interactive tests
Journal-First Papers
Renaud Rwemalika University of Luxembourg, Sarra Habchi Ubisoft, Mike Papadakis University of Luxembourg, Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg, Marie-Claude Brasseur BGL BNP Paribas
15:00
7m
Talk
Bash in the Wild: Language Usage, Code Smells, and Bugs
Journal-First Papers
Yiwen Dong University of Waterloo, Zheyang Li University of Waterloo, Yongqiang Tian University of Waterloo, Chengnian Sun University of Waterloo, Michael W. Godfrey University of Waterloo, Canada, Mei Nagappan University of Waterloo
15:07
7m
Talk
1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis
Journal-First Papers
Ang Jia Xi'an Jiaotong University, Ming Fan Xi'an Jiaotong University, Wuxia Jin Xi'an Jiaotong University, Xi Xu Xi'an Jiaotong University, Zhaohui Zhou Xi'an Jiaotong University, Qiyi Tang Tencent Security Keen Lab, Sen Nie Keen Security Lab, Tencent, Shi Wu Tencent Security Keen Lab, Ting Liu Xi'an Jiaotong University