With the involvement of multiple programming languages in modern software development, cross-lingual code clone detection has gained traction within the software engineering community. Numerous studies have explored this topic, proposing various promising approaches. Inspired by the significant advances in machine learning in recent years, particularly Large Language Models (LLMs), which have demonstrated their ability to tackle various tasks, this paper revisits cross-lingual code clone detection.
We evaluate the performance of five (05) LLMs and eight prompts (08) for the identification of cross-lingual code clones. Additionally, we compare these results against two baseline methods. Finally, we evaluate a pre-trained embedding model to assess the effectiveness of the generated representations for classifying clone and non-clone pairs. The studies involving LLMs and Embedding models are evaluated using two widely used cross-lingual datasets, XLCoST and CodeNet.
Our results show that LLMs can achieve high F1 scores, up to 0.99, for straightforward programming examples. However, they not only perform less well on programs associated with complex programming challenges but also do not necessarily understand the meaning of “code clone” in a cross-lingual setting. We show that embedding models used to represent code fragments from different programming languages in the same representation space enable the training of a basic classifier that outperforms all LLMs by ~1 and ~20 percentage points on the XLCoST and CodeNet datasets, respectively. This finding suggests that, despite the apparent capabilities of LLMs, embeddings provided by embedding models offer suitable representations to achieve state-of-the-art performance in cross-lingual code clone detection.
Mon 23 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
10:30 - 12:30 | |||
10:30 20mTalk | An empirical study of business process models and model clones on GitHub Journal First Mahdi Saeedi Nikoo Eindhoven University of Technology, Sangeeth Kochanthara Eindhoven University of Technology (TU/e) , Önder Babur Eindhoven University of Technology, Mark van den Brand Eindhoven University of Technology | ||
10:50 20mTalk | The Struggles of LLMs in Cross-lingual Code Clone Detection Research Papers Micheline Bénédicte MOUMOULA University of Luxembourg, Abdoul Kader Kaboré University of Luxembourg, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg DOI | ||
11:10 20mTalk | Clone Detection for Smart Contracts: How Far Are We? Research Papers Zuobin Wang Zhejiang University, Zhiyuan Wan Zhejiang University, Yujing Chen Zhejiang University, Yun Zhang Hangzhou City University, David Lo Singapore Management University, Difan Xie Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, Xiaohu Yang Zhejiang University DOI | ||
11:30 20mTalk | Measuring Model Alignment for Code Clone Detection Using Causal Interpretation Journal First Shamsa Abid National University of Computer and Emerging Sciences, Xuemeng Cai Singapore Management University, Lingxiao Jiang Singapore Management University | ||
11:50 20mTalk | An Empirical Study of Code Clones from Commercial AI Code Generators Research Papers Weibin Wu Sun Yat-sen University, Haoxuan Hu Sun Yat-sen University, China, Zhaoji Fan Sun Yat-sen University, Yitong Qiao Sun Yat-sen University, China, Yizhan Huang The Chinese University of Hong Kong, Yichen LI The Chinese University of Hong Kong, Zibin Zheng Sun Yat-sen University, Michael Lyu Chinese University of Hong Kong DOI | ||
12:10 20mTalk | VexIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity Journal First S. VenkataKeerthy IIT Hyderabad, Soumya Banerjee IIT Hyderabad, Sayan Dey IIT Hyderabad, Yashas Andaluri IIT Hyderabad, Raghul PS IIT Hyderabad, Subrahmanyam Kalyanasundaram IIT Hyderabad, Fernando Magno Quintão Pereira Federal University of Minas Gerais, Ramakrishna Upadrasta IIT Hyderabad |
Aurora A is the first room in the Aurora wing.
When facing the main Cosmos Hall, access to the Aurora wing is on the right, close to the side entrance of the hotel.