Investigating the Efficacy of Large Language Models for Code Clone DetectionICPCICPC ERA Paper
Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks, such as code generation. The LLMs are mainly utilized in the prompt-based zero/few-shot paradigm to guide the model in accomplishing the task.
\textbf{Goal:} GPT-based models are one of the popular ones studied for tasks such as code comment generation or test generation. These tasks are generative' tasks. However, there is limited research on the usage of LLMs for
non-generative’ tasks such as classification using the prompt-based paradigm. In this preliminary exploratory study, we investigated the applicability of LLMs for Code Clone Detection(CCD), a non-generative task.
\textbf{Method:} By building a mono-lingual and cross-lingual CCD dataset derived from CodeNet, we first investigated two different prompts using ChatGPT to detect Type-4 code clones in Java-Java and Java-Ruby pairs in the zero-shot setting. We then conducted an analysis to understand the strengths and weaknesses of ChatGPT in CCD.
\textbf{Results:} ChatGPT surpasses the baselines in cross-language CCD and achieves comparable performance to fully fine-tuned models for mono-lingual CCD. Also, the prompt and the difficulty level of the problems have an impact on the performance of ChatGPT. Finally, we provide insights and future directions based on our initial analysis\footnote{Our code and data are open-sourced at \url{https://anonymous.4open.science/r/largeLanguageModels-4A1F}}.
Mon 15 AprDisplayed time zone: Lisbon change
14:00 - 15:30 | Code + Documentation GenerationResearch Track / / Early Research Achievements (ERA) / Replications and Negative Results (RENE) at Sophia de Mello Breyner Andresen Chair(s): Massimiliano Di Penta University of Sannio, Italy | ||
14:00 10mTalk | MESIA: Understanding and Leveraging Supplementary Nature of Method-level Comments for Automatic Comment GenerationICPCICPC Full paper Research Track Xinglu Pan Peking University, Chenxiao Liu Peking University, Yanzhen Zou Peking University, Tao Xie Peking University, Bing Xie Peking University Pre-print | ||
14:10 10mTalk | Compositional API Recommendation for Library-Oriented Code GenerationICPCICPC Full paper Research Track Zexiong Ma Peking University, Shengnan An Xi’an Jiaotong University, Bing Xie Peking University, Zeqi Lin Microsoft Research, China Pre-print | ||
14:20 10mTalk | On the Generalizability of Deep Learning-based Code Completion Across Programming Language VersionsICPCICPC Full paper Research Track Matteo Ciniselli Università della Svizzera Italiana, Alberto Martin-Lopez Software Institute - USI, Lugano, Gabriele Bavota Software Institute @ Università della Svizzera Italiana | ||
14:30 10mTalk | ESGen: Commit Message Generation Based on Edit Sequence of Code ChangeICPCICPC Full paperVirtual-Talk Research Track Xiangping Chen Sun Yat-sen University, Yangzi Li SUN YAT-SEN UNIVERSITY, Zhicao Tang SUN YAT-SEN UNIVERSITY, Yuan Huang School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China, Haojie Zhou School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China, Mingdong Tang Guangdong University of Foreign Studies, Zibin Zheng Sun Yat-sen University | ||
14:40 10mTalk | Improving AST-Level Code Completion with Graph Retrieval and Multi-Field AttentionICPCICPC Full paperVirtual-Talk Research Track Yu Xia Central South University, Tian Liang Central South University, Wei-Huan Min Central South University, Li Kuang School of Computer Science and Engineering, Central South University | ||
14:50 10mTalk | Exploring and Improving Code Completion for Test CodeICPCICPC Full paper Research Track Tingwei Zhu Nanjing University, Zhongxin Liu Zhejiang University, Tongtong Xu Huawei, Ze Tang Software Institute, Nanjing University, Tian Zhang Nanjing University, Minxue Pan Nanjing University, Xin Xia Huawei Technologies | ||
15:00 10mTalk | Understanding the Impact of Branch Edit Features for the Automatic Prediction of Merge Conflict ResolutionsICPCICPC RENE Paper Replications and Negative Results (RENE) Waad riadh aldndni Virginia Tech, Francisco Servant ITIS Software, University of Malaga, Na Meng Virginia Tech | ||
15:10 4mTalk | Investigating the Efficacy of Large Language Models for Code Clone DetectionICPCICPC ERA Paper Early Research Achievements (ERA) Mohamad Khajezade University of British Columbia Okanagan, Jie JW Wu University of British Columbia (UBC), Fatemeh Hendijani Fard University of British Columbia, Gema Rodríguez-Pérez University of British Columbia (UBC), Mohamed S Shehata University of British Columbia | ||
15:14 16mTalk | Code + Documentation Generation: Panel with SpeakersICPC Discussion |