Detecting Semantic Code Clones by Building AST-based Markov Chains ModelVirtual
Code clone detection aims to find functionally similar code fragments, which is becoming more and more important in the field of software engineering. Many code clone detection methods have been proposed, among which tree-based methods are able to handle semantic code clones. However, these methods are difficult to scale to big code due to the complexity of tree structures. In this paper, we design \emph{Amain}, a scalable tree-based semantic code clone detector by building Markov chains models. Specifically, we propose a novel method to transform the complex original tree into simple Markov chains and compute the similarity of all states in these chains. After obtaining all similarity scores, we feed them into a machine learning classifier to train a code clone detector. To examine the effectiveness of \emph{Amain}, we evaluate it on two widely used datasets namely Google Code Jam and BigCloneBench. Experimental results show that \emph{Amain} is superior to five state-of-the-art code clone detection tools (\ie \emph{SourcererCC}, \emph{Deckard}, \emph{RtvNN}, \emph{ASTNN}, and \emph{SCDetector}). Furthermore, compared to a recent tree-based code clone detector \emph{ASTNN}, \emph{Amain} is more than 160 times faster in predicting semantic code clones.
Wed 12 OctDisplayed time zone: Eastern Time (US & Canada) change
16:00 - 18:00 | Technical Session 19 - Formal Methods and Models IResearch Papers / Journal-first Papers / Tool Demonstrations at Ballroom C East Chair(s): Michalis Famelis Université de Montréal | ||
16:00 20mResearch paper | Automatic Comment Generation via Multi-Pass Deliberation Research Papers Fangwen Mu Institute of Software Chinese Academy of Sciences, Xiao Chen Institute of Software Chinese Academy of Sciences, Lin Shi ISCAS, Song Wang York University, Qing Wang Institute of Software at Chinese Academy of Sciences | ||
16:20 10mDemonstration | Building recommender systems for modelling languages with DroidVirtual Tool Demonstrations Lissette Almonte Universidad Autónoma de Madrid, Esther Guerra Universidad Autónoma de Madrid, Iván Cantador Universidad Autónoma de Madrid, Juan de Lara Autonomous University of Madrid Pre-print Media Attached | ||
16:30 10mDemonstration | RobSimVer: A Tool for RoboSim Modeling and AnalysisVirtual Tool Demonstrations Dehui Du East China Normal University, Ana Cavalcanti University of York, JihuiNie East China Normal University | ||
16:40 20mResearch paper | Provably Tightest Linear Approximation for Robustness Verification of Sigmoid-like Neural NetworksVirtual Research Papers Zhaodi Zhang East China Normal University, Yiting Wu East China Normal University, Si Liu ETH Zurich, Jing Liu East China Normal University, Min Zhang East China Normal University | ||
17:00 20mResearch paper | Efficient Synthesis of Method Call Sequences for Test Generation and Bounded VerificationVirtual Research Papers Yunfan Zhang Peking University, Ruidong Zhu Peking University, Yingfei Xiong Peking University, Tao Xie Peking University | ||
17:20 20mPaper | Demystifying Performance Regressions in String SolversVirtual Journal-first Papers Yao Zhang , Xiaofei Xie Singapore Management University, Singapore, Yi Li Nanyang Technological University, Yun Lin National University of Singapore, Sen Chen Tianjin University, Yang Liu Nanyang Technological University, Xiaohong Li TianJin University Link to publication DOI | ||
17:40 20mResearch paper | Detecting Semantic Code Clones by Building AST-based Markov Chains ModelVirtual Research Papers Yueming Wu Nanyang Technological University, Siyue Feng Huazhong University of Science and Technology, Deqing Zou Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology |