TreeCen: Building Tree Graph for Scalable Semantic Code Clone DetectionVirtual
Code clone detection is an important research problem that has attracted wide attention in software engineering. Many methods have been proposed for detecting code clone, among which text-based and token-based approaches are scalable but lack consideration of code semantics, thus resulting in the inability to detect semantic code clones. Methods based on intermediate representations of codes can solve the problem of semantic code clone detection. However, graph-based methods are not practicable due to code compilation, and existing tree-based approaches are limited by the scale of trees for scalable code clone detection.
In this paper, we propose \emph{TreeCen}, a scalable tree-based code clone detector, which satisfies scalability while detecting semantic clones effectively. Given the source code of a method, we first extract its abstract syntax tree (AST) based on static analysis and transform it into a simple graph representation (\ie tree graph) according to the node type, rather than using traditional heavyweight tree matching. We then treat the tree graph as a social network and adopt centrality analysis on each node to maintain the tree details. By this, the original complex tree can be converted into a 72-dimensional vector while containing comprehensive structural information of the AST. Finally, these vectors are fed into a machine learning model to train a detector and use it to find code clones. We conduct comparative evaluations on effectiveness and scalability. The experimental results show that \emph{TreeCen} maintains the best performance of the other six state-of-the-art methods (\ie \emph{SourcererCC}, \emph{RtvNN}, \emph{DeepSim}, \emph{SCDetector}, \emph{Deckard}, and \emph{ASTNN}) with F1 scores 0.99 and 0.95 on BigCloneBench and Google Code Jam datasets, respectively. In terms of scalability, \emph{TreeCen} is about 79 times faster than another state-of-the-art tree-based semantic code clone detector (\ie \emph{ASTNN}).
Thu 13 OctDisplayed time zone: Eastern Time (US & Canada) change
16:00 - 18:00 | Technical Session 31 - Code Similarities and RefactoringResearch Papers / Tool Demonstrations / Journal-first Papers at Banquet A Chair(s): Hua Ming Oakland University | ||
16:00 20mResearch paper | Reformulator: Automated Refactoring of the N+1 Problem in Database-Backed Applications Research Papers Alexi Turcotte Northeastern University, Mark W. Aldrich Tufts University, Frank Tip Northeastern University | ||
16:20 20mPaper | How Software Refactoring Impacts Execution Time Journal-first Papers Luca Traini University of L'Aquila, Daniele Di Pompeo University of L'Aquila, Michele Tucci Charles University, Bin Lin Radboud University, Simone Scalabrino University of Molise, Gabriele Bavota Software Institute, USI Università della Svizzera italiana, Michele Lanza Software Institute - USI, Lugano, Rocco Oliveto University of Molise, Vittorio Cortellessa University of L'Aquila Link to publication DOI Authorizer link | ||
16:40 20mResearch paper | Learning to Synthesize Relational Invariants Research Papers | ||
17:00 10mDemonstration | AntiCopyPaster: Extracting Code Duplicates As Soon As They Are Introduced in the IDE Tool Demonstrations Eman Abdullah AlOmar Stevens Institute of Technology, Anton Ivanov HSE University, Zarina Kurbatova JetBrains Research, Yaroslav Golubev JetBrains Research, Mohamed Wiem Mkaouer Rochester Institute of Technology, Ali Ouni ETS Montreal, University of Quebec, Timofey Bryksin JetBrains Research, Le Nguyen Rochester Institute of Technology, Amit Kini Rochester Institute of Technology, Aditya Thakur Rochester Institute of Technology DOI Pre-print | ||
17:10 20mResearch paper | TreeCen: Building Tree Graph for Scalable Semantic Code Clone DetectionVirtual Research Papers Yutao Hu Huazhong University of Science and Technology, Deqing Zou Huazhong University of Science and Technology, Junru Peng Xidian University, Yueming Wu Nanyang Technological University, Junjie Shan KTH Royal Institute of Technology, Hai Jin Huazhong University of Science and Technology | ||
17:30 10mDemonstration | Trimmer: Context-Specific Code ReductionVirtual Tool Demonstrations Aatira Anum Ahmad Lahore University of Management Sciences, Mubashir Anwar University of Illinois Urbana-Champaign, Hashim Sharif University of Illinois at Urbana-Champaign, Ashish Gehani SRI, Fareed Zaffar Lahore University of Management Sciences | ||
17:40 20mResearch paper | Studying and Understanding the Tradeoffs Between Generality and Reduction in Software DebloatingVirtual Research Papers |