Detecting Semantic Code Clones by Building AST-based Markov Chains Model (ASE 2022 - Research Papers)

Who

Yueming Wu, Siyue Feng, Deqing Zou, Hai Jin

Track

ASE 2022 Research Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 12 Oct 2022 17:40 - 18:00 at Ballroom C East - Technical Session 19 - Formal Methods and Models I Chair(s): Michalis Famelis

Abstract

Code clone detection aims to find functionally similar code fragments, which is becoming more and more important in the field of software engineering. Many code clone detection methods have been proposed, among which tree-based methods are able to handle semantic code clones. However, these methods are difficult to scale to big code due to the complexity of tree structures. In this paper, we design \emph{Amain}, a scalable tree-based semantic code clone detector by building Markov chains models. Specifically, we propose a novel method to transform the complex original tree into simple Markov chains and compute the similarity of all states in these chains. After obtaining all similarity scores, we feed them into a machine learning classifier to train a code clone detector. To examine the effectiveness of \emph{Amain}, we evaluate it on two widely used datasets namely Google Code Jam and BigCloneBench. Experimental results show that \emph{Amain} is superior to five state-of-the-art code clone detection tools (\ie \emph{SourcererCC}, \emph{Deckard}, \emph{RtvNN}, \emph{ASTNN}, and \emph{SCDetector}). Furthermore, compared to a recent tree-based code clone detector \emph{ASTNN}, \emph{Amain} is more than 160 times faster in predicting semantic code clones.

Yueming Wu

Nanyang Technological University

Siyue Feng

Huazhong University of Science and Technology

Deqing Zou

Huazhong University of Science and Technology

Hai Jin

Huazhong University of Science and Technology

China

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 12 Oct
Displayed time zone: Eastern Time (US & Canada) change

16:00 - 18:00	Technical Session 19 - Formal Methods and Models IResearch Papers / Journal-first Papers / Tool Demonstrations at Ballroom C East Chair(s): Michalis Famelis Université de Montréal

16:00 20m Research paper		Automatic Comment Generation via Multi-Pass Deliberation Research Papers Fangwen Mu Institute of Software Chinese Academy of Sciences, Xiao Chen Institute of Software Chinese Academy of Sciences, Lin Shi ISCAS, Song Wang York University, Qing Wang Institute of Software at Chinese Academy of Sciences
16:20 10m Demonstration		Building recommender systems for modelling languages with DroidVirtual Tool Demonstrations Lissette Almonte Universidad Autónoma de Madrid, Esther Guerra Universidad Autónoma de Madrid, Iván Cantador Universidad Autónoma de Madrid, Juan de Lara Autonomous University of Madrid Pre-print Media Attached
16:30 10m Demonstration		RobSimVer: A Tool for RoboSim Modeling and AnalysisVirtual Tool Demonstrations Dehui Du East China Normal University, Ana Cavalcanti University of York, JihuiNie East China Normal University
16:40 20m Research paper		Provably Tightest Linear Approximation for Robustness Verification of Sigmoid-like Neural NetworksVirtual Research Papers Zhaodi Zhang East China Normal University, Yiting Wu East China Normal University, Si Liu ETH Zurich, Jing Liu East China Normal University, Min Zhang East China Normal University
17:00 20m Research paper		Efficient Synthesis of Method Call Sequences for Test Generation and Bounded VerificationVirtual Research Papers Yunfan Zhang Peking University, Ruidong Zhu Peking University, Yingfei Xiong Peking University, Tao Xie Peking University
17:20 20m Paper		Demystifying Performance Regressions in String SolversVirtual Journal-first Papers Yao Zhang , Xiaofei Xie Singapore Management University, Singapore, Yi Li Nanyang Technological University, Yun Lin National University of Singapore, Sen Chen Tianjin University, Yang Liu Nanyang Technological University, Xiaohong Li TianJin University Link to publication DOI
17:40 20m Research paper		Detecting Semantic Code Clones by Building AST-based Markov Chains ModelVirtual Research Papers Yueming Wu Nanyang Technological University, Siyue Feng Huazhong University of Science and Technology, Deqing Zou Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology