Automated Program Repair (APR) is a powerful technique for mitigating the impact of software bugs in software development. The recent remarkable success of Large Language Models (LLMs) has set new state-of-the-art performance in APR. However, the extensive use of large training corpora raises concerns about whether these impressive capabilities genuinely generalize to unseen tasks or primarily rely on memorizing vast amounts of pretraining data. To address this issue, this paper introduces a memorization-inducing prompting strategy, MemInducer, to investigate the extent of memorization in LLMs for APR. Specifically, MemInducer is designed to prompt LLMs to recall responses from their training corpus. Subsequently, we assess memorization by measuring the similarity between the responses generated by the LLM and the corresponding ground truth. Experimental results reveal that memorization is indeed present in existing APR benchmarks, with over 78% of corrected bugs producing results that are entirely identical to the ground truth.
Tue 29 AprDisplayed time zone: Eastern Time (US & Canada) change
09:00 - 10:30 | APR Session 1APR at 210 Chair(s): Tegawendé F. Bissyandé University of Luxembourg, Chao Peng ByteDance | ||
09:00 50mKeynote | Baishakhi Ray's Keynote APR Baishakhi Ray Columbia University | ||
09:50 20mTalk | Can GPT-O1 Kill All Bugs? An Evaluation of GPT-Family LLMs on QuixBugs APR Haichuan Hu Alibaba Cloud, Tongke Zhang Nanjing University, Guolin Xu Chongqing University, Congqing He University Sains Malaysia, Quanjun Zhang Nanjing University | ||
10:10 20mTalk | Memorization in LLM-Based Program Repair APR Jiaolong Kong Singapore Management University, Mingfei Cheng Singapore Management University, Xiaofei Xie Singapore Management University |