Comparison and Evaluation of Clone Detection Techniques with Different Code Representations (ICSE 2023 - Technical Track)

Who

Yuekun Wang, Yuhang Ye, Yueming Wu, Weiwei Zhang, Yinxing Xue, Yang Liu

Track

ICSE 2023 Technical Track

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 May 2023 13:45 - 14:00 at Level G - Plenary Room 1 - Code smells and clones Chair(s): Sigrid Eldh

Abstract

As one of bad smells in code, code clones may increase the cost of software maintenance and the risk of vulnerability propagation. In the past two decades, numerous clone detection technologies have been proposed. They can be divided into text-based, token-based, tree-based, and graph-based approaches according to their code representations. Different code representations abstract the code details from different perspectives. However, it is unclear which code representation is more effective in detecting code clones and how to combine different code representations to achieve ideal performance.

In this paper, we present an empirical study to compare the clone detection ability of different code representations. Specifically, we reproduce 12 clone detection algorithms and divide them into different groups according to their code representations. After analyzing the empirical results, we find that token and tree representations can perform better than graph representation when detecting simple code clones. However, when the code complexity of a code pair increases, graph representation becomes more effective. To make our findings more practical, we perform manual analysis on open-source projects to seek a possible distribution of different clone types in open-source community. Through the results, we observe that most clone pairs belong to simple code clones. Based on this observation, we discard heavyweight graph-based clone detection algorithms and conduct combination experiments to find out a suitable combination of token-based and tree-based approaches for achieving scalable and effective code clone detection. We develop the suitable combination into a tool called TACC and evaluate it with other state-of-the-art code clone detectors. Experimental results indicate that TACC performs better and has the ability to detect large-scale code clones.

Yuekun Wang

University of Science and Technology of China

Yuhang Ye

University of Science and Technology of China

Yueming Wu

Nanyang Technological University

Weiwei Zhang

University of Science and Technology of China

Yinxing Xue

University of Science and Technology of China

China

Yang Liu

Nanyang Technological University

Singapore

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 May
Displayed time zone: Hobart change

13:45 - 15:15	Code smells and clonesTechnical Track / Journal-First Papers / SEIP - Software Engineering in Practice at Level G - Plenary Room 1 Chair(s): Sigrid Eldh Ericsson AB, Mälardalen University, Carleton Unviersity

13:45 15m Talk		Comparison and Evaluation of Clone Detection Techniques with Different Code Representations Technical Track Yuekun Wang University of Science and Technology of China, Yuhang Ye University of Science and Technology of China, Yueming Wu Nanyang Technological University, Weiwei Zhang University of Science and Technology of China, Yinxing Xue University of Science and Technology of China, Yang Liu Nanyang Technological University
14:00 15m Talk		Learning Graph-based Code Representations for Source-level Functional Similarity Detection Technical Track Jiahao Liu National University of Singapore, Jun Zeng National University of Singapore, Xiang Wang University of Science and Technology of China, Zhenkai Liang National University of Singapore
14:15 15m Talk		The Smelly Eight: An Empirical Study on the Prevalence of Code Smells in Quantum Computing Technical Track Qihong Chen University of California, Irvine, Rúben Câmara LASIGE and Department of Informatics are Faculdade Ciências Universidade de Lisboa,, José Campos University of Porto, Portugal, André Souto LaSiGE & FCUL, University of Lisbon, Iftekhar Ahmed University of California at Irvine Pre-print
14:30 15m Talk		An Empirical Comparison on the Results of Different Clone Detection Setups for C-based Projects SEIP - Software Engineering in Practice Yan Zhou Huawei, Jinfu Chen Centre for Software Excellence, Huawei, Canada, Yong Shi Huawei Technologies, Boyuan Chen Centre for Software Excellence, Huawei Canada, Zhen Ming (Jack) Jiang York University
14:45 7m Talk		Developers’ perception matters: machine learning to detect developer-sensitive smells Journal-First Papers Daniel Oliveira PUC-Rio, Wesley Assunção Johannes Kepler University Linz, Austria & Pontifical Catholic University of Rio de Janeiro, Brazil, Alessandro Garcia PUC-Rio, Baldoino Fonseca Federal University of Alagoas (UFAL), Márcio Ribeiro Federal University of Alagoas, Brazil
14:52 7m Talk		Smells in system user interactive tests Journal-First Papers Renaud Rwemalika University of Luxembourg, Sarra Habchi Ubisoft, Mike Papadakis University of Luxembourg, Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg, Marie-Claude Brasseur BGL BNP Paribas
15:00 7m Talk		Bash in the Wild: Language Usage, Code Smells, and Bugs Journal-First Papers Yiwen Dong University of Waterloo, Zheyang Li University of Waterloo, Yongqiang Tian University of Waterloo, Chengnian Sun University of Waterloo, Michael W. Godfrey University of Waterloo, Canada, Mei Nagappan University of Waterloo
15:07 7m Talk		1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis Journal-First Papers Ang Jia Xi'an Jiaotong University, Ming Fan Xi'an Jiaotong University, Wuxia Jin Xi'an Jiaotong University, Xi Xu Xi'an Jiaotong University, Zhaohui Zhou Xi'an Jiaotong University, Qiyi Tang Tencent Security Keen Lab, Sen Nie Keen Security Lab, Tencent, Shi Wu Tencent Security Keen Lab, Ting Liu Xi'an Jiaotong University