Write a Blog >>

Code clone detection is an important research problem that has attracted wide attention in software engineering. Many methods have been proposed for detecting code clone, among which text-based and token-based approaches are scalable but lack consideration of code semantics, thus resulting in the inability to detect semantic code clones. Methods based on intermediate representations of codes can solve the problem of semantic code clone detection. However, graph-based methods are not practicable due to code compilation, and existing tree-based approaches are limited by the scale of trees for scalable code clone detection.

In this paper, we propose \emph{TreeCen}, a scalable tree-based code clone detector, which satisfies scalability while detecting semantic clones effectively. Given the source code of a method, we first extract its abstract syntax tree (AST) based on static analysis and transform it into a simple graph representation (\ie tree graph) according to the node type, rather than using traditional heavyweight tree matching. We then treat the tree graph as a social network and adopt centrality analysis on each node to maintain the tree details. By this, the original complex tree can be converted into a 72-dimensional vector while containing comprehensive structural information of the AST. Finally, these vectors are fed into a machine learning model to train a detector and use it to find code clones. We conduct comparative evaluations on effectiveness and scalability. The experimental results show that \emph{TreeCen} maintains the best performance of the other six state-of-the-art methods (\ie \emph{SourcererCC}, \emph{RtvNN}, \emph{DeepSim}, \emph{SCDetector}, \emph{Deckard}, and \emph{ASTNN}) with F1 scores 0.99 and 0.95 on BigCloneBench and Google Code Jam datasets, respectively. In terms of scalability, \emph{TreeCen} is about 79 times faster than another state-of-the-art tree-based semantic code clone detector (\ie \emph{ASTNN}).

Thu 13 Oct

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 18:00
Technical Session 31 - Code Similarities and RefactoringResearch Papers / Tool Demonstrations / Journal-first Papers at Banquet A
Chair(s): Hua Ming Oakland University
16:00
20m
Research paper
Reformulator: Automated Refactoring of the N+1 Problem in Database-Backed Applications
Research Papers
Alexi Turcotte Northeastern University, Mark W. Aldrich Tufts University, Frank Tip Northeastern University
16:20
20m
Paper
How Software Refactoring Impacts Execution Time
Journal-first Papers
Luca Traini University of L'Aquila, Daniele Di Pompeo University of L'Aquila, Michele Tucci Charles University, Bin Lin Radboud University, Simone Scalabrino University of Molise, Gabriele Bavota Software Institute, USI Università della Svizzera italiana, Michele Lanza Software Institute - USI, Lugano, Rocco Oliveto University of Molise, Vittorio Cortellessa University of L'Aquila
Link to publication DOI Authorizer link
16:40
20m
Research paper
Learning to Synthesize Relational Invariants
Research Papers
Jingbo Wang University of Southern California, Chao Wang USC
17:00
10m
Demonstration
AntiCopyPaster: Extracting Code Duplicates As Soon As They Are Introduced in the IDE
Tool Demonstrations
Eman Abdullah AlOmar Stevens Institute of Technology, Anton Ivanov HSE University, Zarina Kurbatova JetBrains Research, Yaroslav Golubev JetBrains Research, Mohamed Wiem Mkaouer Rochester Institute of Technology, Ali Ouni ETS Montreal, University of Quebec, Timofey Bryksin JetBrains Research, Le Nguyen Rochester Institute of Technology, Amit Kini Rochester Institute of Technology, Aditya Thakur Rochester Institute of Technology
DOI Pre-print
17:10
20m
Research paper
TreeCen: Building Tree Graph for Scalable Semantic Code Clone DetectionVirtual
Research Papers
Yutao Hu Huazhong University of Science and Technology, Deqing Zou Huazhong University of Science and Technology, Junru Peng Xidian University, Yueming Wu Nanyang Technological University, Junjie Shan KTH Royal Institute of Technology, Hai Jin Huazhong University of Science and Technology
17:30
10m
Demonstration
Trimmer: Context-Specific Code ReductionVirtual
Tool Demonstrations
Aatira Anum Ahmad Lahore University of Management Sciences, Mubashir Anwar University of Illinois Urbana-Champaign, Hashim Sharif University of Illinois at Urbana-Champaign, Ashish Gehani SRI, Fareed Zaffar Lahore University of Management Sciences
17:40
20m
Research paper
Studying and Understanding the Tradeoffs Between Generality and Reduction in Software DebloatingVirtual
Research Papers
Qi Xin Wuhan University, Qirun Zhang Georgia Institute of Technology, Alessandro Orso Georgia Tech