Clone Detection for Smart Contracts: How Far Are We? (FSE 2025 - Research Papers)

Who

Zuobin Wang , Zhiyuan Wan, Yujing Chen, Yun Zhang, David Lo, Difan Xie, Xiaohu Yang

Track

FSE 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 23 Jun 2025 11:10 - 11:30 at Aurora A - Clones Chair(s): Julia Lawall

Abstract

In smart contract development, practitioners frequently reuse code to reduce development effort and avoid reinventing the wheel. This reused code, whether identical or similar to its original source, is referred to as a code clone. Unintentional code cloning can propagate flaws and vulnerabilities, potentially undermining the reliability and maintainability of software systems. Previous studies have identified a significant prevalence of code clones in Solidity smart contracts on the Ethereum blockchain. To mitigate the risks posed by code clones, clone detection has emerged as an active field of research and practice in software engineering. Recent studies have extended existing techniques or proposed novel techniques tailored to the unique syntactic and semantic features of Solidity.

Nonetheless, the evaluations of existing techniques, whether conducted by their original authors or independent researchers, involve codebases in various programming languages and utilize different versions of the corresponding tools. The resulting inconsistency makes direct comparisons of the evaluation results impractical, and hinders the ability to derive meaningful conclusions across the evaluations. There remains a lack of clarity regarding the effectiveness of these techniques in detecting smart contract clones, and whether it is feasible to combine different techniques to achieve scalable yet accurate detection of code clones in smart contracts.

To address this gap, we conduct a comprehensive empirical study that evaluates the effectiveness and scalability of five representative clone detection techniques on 33,073 verified Solidity smart contracts, along with a benchmark we curate, in which we manually label 72,010 pairs of Solidity smart contracts with clone tags. Moreover, we explore the potential of combining different techniques to achieve optimal performance of code clone detection for smart contracts, and propose \textsc{SourceREClone}, a framework designed for the refined integration of different techniques, which achieves a 36.9% improvement in F1 score compared to a straightforward combination of the state of the art. Based on our findings, we discuss implications, provide recommendations for practitioners, and outline directions for future research.

DOI

https://doi.org/10.1145/3715776

Zuobin Wang

Zhejiang University

Zhiyuan Wan

Zhejiang University

China

Yujing Chen

Zhejiang University

Yun Zhang

Hangzhou City University

China

David Lo

Singapore Management University

Singapore

Difan Xie

Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security

Xiaohu Yang

Zhejiang University

China

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 23 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 12:30	ClonesResearch Papers / Journal First at Aurora A Chair(s): Julia Lawall Inria

10:30 20m Talk		An empirical study of business process models and model clones on GitHub Journal First Mahdi Saeedi Nikoo Eindhoven University of Technology, Sangeeth Kochanthara Netherlands' Space Obervatory - ASTRON, Önder Babur Eindhoven University of Technology, Mark van den Brand Eindhoven University of Technology
10:50 20m Talk		The Struggles of LLMs in Cross-lingual Code Clone Detection Research Papers Micheline Bénédicte MOUMOULA University of Luxembourg, Abdoul Kader Kaboré University of Luxembourg, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg DOI
11:10 20m Talk		Clone Detection for Smart Contracts: How Far Are We? Research Papers Zuobin Wang Zhejiang University, Zhiyuan Wan Zhejiang University, Yujing Chen Zhejiang University, Yun Zhang Hangzhou City University, David Lo Singapore Management University, Difan Xie Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, Xiaohu Yang Zhejiang University DOI
11:30 20m Talk		Measuring Model Alignment for Code Clone Detection Using Causal Interpretation Journal First Shamsa Abid National University of Computer and Emerging Sciences, Xuemeng Cai Singapore Management University, Lingxiao Jiang Singapore Management University
11:50 20m Talk		An Empirical Study of Code Clones from Commercial AI Code Generators Research Papers Weibin Wu Sun Yat-sen University, Haoxuan Hu Sun Yat-sen University, China, Zhaoji Fan Sun Yat-sen University, Yitong Qiao Sun Yat-sen University, China, Yizhan Huang The Chinese University of Hong Kong, Yichen LI The Chinese University of Hong Kong, Zibin Zheng Sun Yat-sen University, Michael Lyu Chinese University of Hong Kong DOI
12:10 20m Talk		VexIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity Journal First S. VenkataKeerthy IIT Hyderabad, Soumya Banerjee IIT Hyderabad, Sayan Dey IIT Hyderabad, Yashas Andaluri IIT Hyderabad, Raghul PS IIT Hyderabad, Subrahmanyam Kalyanasundaram IIT Hyderabad, Fernando Magno Quintão Pereira Federal University of Minas Gerais, Ramakrishna Upadrasta IIT Hyderabad

Information for Participants

Mon 23 Jun 2025 10:30 - 12:30 at Aurora A - Clones Chair(s): Julia Lawall

Info for room Aurora A:

Aurora A is the first room in the Aurora wing.

When facing the main Cosmos Hall, access to the Aurora wing is on the right, close to the side entrance of the hotel.