Memorization in LLM-Based Program Repair (APR 2025)

Who

Jiaolong Kong, Mingfei Cheng, Xiaofei Xie

Track

APR 2025 Automated Program Repair

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 29 Apr 2025 10:10 - 10:30 at 210 - APR Session 1 Chair(s): Tegawendé F. Bissyandé, Chao Peng

Abstract

Automated Program Repair (APR) is a powerful technique for mitigating the impact of software bugs in software development. The recent remarkable success of Large Language Models (LLMs) has set new state-of-the-art performance in APR. However, the extensive use of large training corpora raises concerns about whether these impressive capabilities genuinely generalize to unseen tasks or primarily rely on memorizing vast amounts of pretraining data. To address this issue, this paper introduces a memorization-inducing prompting strategy, MemInducer, to investigate the extent of memorization in LLMs for APR. Specifically, MemInducer is designed to prompt LLMs to recall responses from their training corpus. Subsequently, we assess memorization by measuring the similarity between the responses generated by the LLM and the corresponding ground truth. Experimental results reveal that memorization is indeed present in existing APR benchmarks, with over 78% of corrected bugs producing results that are entirely identical to the ground truth.

Jiaolong Kong

Singapore Management University

Mingfei Cheng

Singapore Management University

Singapore

Xiaofei Xie

Singapore Management University

Singapore

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 29 Apr
Displayed time zone: Eastern Time (US & Canada) change

09:00 - 10:30	APR Session 1APR at 210 Chair(s): Tegawendé F. Bissyandé University of Luxembourg, Chao Peng ByteDance

09:00 50m Keynote		Baishakhi Ray's Keynote APR Baishakhi Ray Columbia University
09:50 20m Talk		Can GPT-O1 Kill All Bugs? An Evaluation of GPT-Family LLMs on QuixBugs APR Haichuan Hu Alibaba Cloud, Tongke Zhang Nanjing University, Guolin Xu Chongqing University, Congqing He University Sains Malaysia, Quanjun Zhang Nanjing University
10:10 20m Talk		Memorization in LLM-Based Program Repair APR Jiaolong Kong Singapore Management University, Mingfei Cheng Singapore Management University, Xiaofei Xie Singapore Management University