Write a Blog >>

Code clone detection aims to find functionally similar code fragments, which is becoming more and more important in the field of software engineering. Many code clone detection methods have been proposed, among which tree-based methods are able to handle semantic code clones. However, these methods are difficult to scale to big code due to the complexity of tree structures. In this paper, we design \emph{Amain}, a scalable tree-based semantic code clone detector by building Markov chains models. Specifically, we propose a novel method to transform the complex original tree into simple Markov chains and compute the similarity of all states in these chains. After obtaining all similarity scores, we feed them into a machine learning classifier to train a code clone detector. To examine the effectiveness of \emph{Amain}, we evaluate it on two widely used datasets namely Google Code Jam and BigCloneBench. Experimental results show that \emph{Amain} is superior to five state-of-the-art code clone detection tools (\ie \emph{SourcererCC}, \emph{Deckard}, \emph{RtvNN}, \emph{ASTNN}, and \emph{SCDetector}). Furthermore, compared to a recent tree-based code clone detector \emph{ASTNN}, \emph{Amain} is more than 160 times faster in predicting semantic code clones.

Wed 12 Oct

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 18:00
Technical Session 19 - Formal Methods and Models IResearch Papers / Journal-first Papers / Tool Demonstrations at Ballroom C East
Chair(s): Michalis Famelis Université de Montréal
16:00
20m
Research paper
Automatic Comment Generation via Multi-Pass Deliberation
Research Papers
Fangwen Mu Institute of Software Chinese Academy of Sciences, Xiao Chen Institute of Software Chinese Academy of Sciences, Lin Shi ISCAS, Song Wang York University, Qing Wang Institute of Software at Chinese Academy of Sciences
16:20
10m
Demonstration
Building recommender systems for modelling languages with DroidVirtual
Tool Demonstrations
Lissette Almonte Universidad Autónoma de Madrid, Esther Guerra Universidad Autónoma de Madrid, Iván Cantador Universidad Autónoma de Madrid, Juan de Lara Autonomous University of Madrid
Pre-print Media Attached
16:30
10m
Demonstration
RobSimVer: A Tool for RoboSim Modeling and AnalysisVirtual
Tool Demonstrations
Dehui Du East China Normal University, Ana Cavalcanti University of York, JihuiNie East China Normal University
16:40
20m
Research paper
Provably Tightest Linear Approximation for Robustness Verification of Sigmoid-like Neural NetworksVirtual
Research Papers
Zhaodi Zhang East China Normal University, Yiting Wu East China Normal University, Si Liu ETH Zurich, Jing Liu East China Normal University, Min Zhang East China Normal University
17:00
20m
Research paper
Efficient Synthesis of Method Call Sequences for Test Generation and Bounded VerificationVirtual
Research Papers
Yunfan Zhang Peking University, Ruidong Zhu Peking University, Yingfei Xiong Peking University, Tao Xie Peking University
17:20
20m
Paper
Demystifying Performance Regressions in String SolversVirtual
Journal-first Papers
Yao Zhang , Xiaofei Xie Singapore Management University, Singapore, Yi Li Nanyang Technological University, Singapore, Yun Lin National University of Singapore, Sen Chen Tianjin University, Yang Liu Nanyang Technological University, Xiaohong Li TianJin University
Link to publication DOI
17:40
20m
Research paper
Detecting Semantic Code Clones by Building AST-based Markov Chains ModelVirtual
Research Papers
Yueming Wu Nanyang Technological University, Siyue Feng Huazhong University of Science and Technology, Deqing Zou Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology