Executable but Not Reproducible? An Empirical Study of Code Clone Detection Tools
Code clone detection has been a central topic in software engineering research for more than two decades and underpins a wide range of studies on software maintenance, refactoring, and quality assurance. Despite the hundreds of proposed clone detection tools spanning syntactic, semantic, cross-language, and learning-based techniques, little is known about whether these tools remain executable in modern software environments or whether their reported experimental results can be independently reproduced over time. In this study, we conduct a large-scale empirical investigation of the reproducibility of published code clone detection tools, focusing on tool executability and result-level reproducibility. Using a multi-stage screening process inspired by PRISMA-style study selection and complemented by backward snowballing, we refined an initial corpus of 7,380 clone-related publications to 854 primary studies published between 2001 and 2026. From these studies, we identified and curated a representative subset of 84 distinct clone detection tools for hands-on reproduction, balancing detection paradigms, clone types, programming language support, and publication eras. Reproduction was carried out using a standardized, protocol-driven framework in which independent investigators attempted to obtain, configure, and execute each tool using only the documentation and artifacts provided by the original studies. Reproducibility outcomes were systematically recorded using a unified data collection schema and a graded Tool Executability Status taxonomy. Our results reveal clear and systematic differences across generations of clone detection techniques. Classical text, token, and AST-based tools are generally executable with minimal effort, whereas many semantic, cross-language, and learning-based tools exhibit substantial reproducibility challenges, including undocumented dependencies, environment drift, and result divergence. Notably, successful execution does not necessarily imply reproduction of reported results. These findings provide empirical evidence that reproducibility challenges persist across the clone detection landscape and may intensify as techniques evolve toward more complex and dependency-heavy models, motivating stronger artifact and reproducibility practices.
Tue 14 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | |||
11:00 5mTalk | Fixing Less by Preventing More: Semantic Checklists for Robust Code Translation Journal Ahead Workshop (JAWs) Penghao Jiang University of New South Wales, Ruijun Feng University of New South Wales, Xiao Cheng Macquarie University, Jiaojiao Jiang University of New South Wales, Yulei Sui University of New South Wales | ||
11:05 5mTalk | Executable but Not Reproducible? An Empirical Study of Code Clone Detection Tools Journal Ahead Workshop (JAWs) Palash Ranjan Roy University of Saskatchewan, Banani Roy University of Saskatchewan, Kevin Schneider University of Saskatchewan, Chanchal K. Roy University of Saskatchewan | ||
11:10 5mTalk | ARMS: A Vision for Actor Reputation Metric Systems in the Open-Source Software Supply Chain Journal Ahead Workshop (JAWs) Kelechi G. Kalu Purdue University, Sofia Okorafor Purdue University, BetĂĽl Durak Microsoft Research, Kim Laine Microsoft Research, Redmond, Radames Cruz Moreno Microsoft Research, Santiago Torres-Arias Purdue University, James C. Davis Purdue University Pre-print | ||
11:15 5mTalk | Practitioners’ Experiences and Expectations about Software Sustainability in Industry: A Semi-Structured Interview Study Journal Ahead Workshop (JAWs) Jennifer Gross Uppsala University, Aaliyah Chang Queen's University, Mariam Guizani Queen's University, Canada, Sofia Ouhbi Uppsala University, Tobias Wrigstad Uppsala University | ||
11:20 5mTalk | Compartmentalization-Aware Automated Program Repair Journal Ahead Workshop (JAWs) Jia Hu The University of Manchester, Youcheng Sun MBZUAI, Pierre Olivier The University of Manchester | ||
11:25 5mTalk | Code Comprehension Beyond Best Practices: Exploring the Developer Cognitive Spectrum Journal Ahead Workshop (JAWs) Faith Culas University of Auckland, Reid Holmes University of British Columbia, Thomas Fritz University of Zurich, Priyanka Dhopade University of Auckland, Kelly Blincoe University of Auckland | ||
11:30 5mTalk | Search-Based Evolutionary Data Pruning for Class-Level Code Summarization Journal Ahead Workshop (JAWs) Joseph Call William & Mary, Daniele Bifolco University of Sannio, Massimiliano Di Penta University of Sannio, Italy, Antonio Mastropaolo William and Mary, USA | ||
11:35 5mTalk | Are We Building on Unreliable Ground? A Case Study on Unresolvable Pairs in Learning-based Code Repair Journal Ahead Workshop (JAWs) Shihao Weng Nanjing University, Yang Feng Nanjing University, xinguohua Tianjin University, Zhenlun Zhang Nanjing University, Yining Yin Nanjing University, Jia Liu Nanjing University | ||
11:40 5mTalk | Operationalizing Research Software for Supply Chain Security Journal Ahead Workshop (JAWs) Kelechi G. Kalu Purdue University, Soham Rattan Purdue University, Taylor R. Schorlemmer Purdue University, George K. Thiruvathukal Loyola University Chicago, Jeff Carver University of Alabama, James C. Davis Purdue University Pre-print | ||
11:45 5mTalk | Static and Semantic Program Slicing for Quantum Programs Journal Ahead Workshop (JAWs) Hakam W. Alomari Miami University | ||
11:50 5mTalk | ScannerTrap: Benchmarking the Robustness of Web Vulnerability Scanners in Complex Application Environments Journal Ahead Workshop (JAWs) Weizhe Wang Tianjin University, Yao Zhang Tianjin University, Hao Liu QAX Technology Group Inc, Shuai Hu State Grid Xinjiang Electric Power Research Institute, Guangquan Xu School of Cybersecurity, Tianjin University, Bin Wu Tianjin University | ||
11:55 25mPanel | Panel Discussion: Repair, Evolution, Comprehension, and Security Journal Ahead Workshop (JAWs) | ||
12:20 10mAwards | Selection of award presentations Journal Ahead Workshop (JAWs) | ||