Demystifying Memorization in LLM-based Program Repair via a General Hypothesis Testing Framework (FSE 2025 - Research Papers)

Who

Jiaolong Kong, Xiaofei Xie, Shangqing Liu

Track

FSE 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 23 Jun 2025 16:40 - 17:00 at Andromeda - Repairs Chair(s): Michael Pradel

Abstract

Large Language Models (LLMs) have achieved remarkable success in various applications, particularly in code-related tasks such as code generation and program repair, setting new performance benchmarks. However, the extensive use of large training corpora raises concerns about whether these achievements stem from genuine understanding or mere memorization of training data—a question often overlooked in current research. This paper aims to study the memorization issue within LLM-based program repair by investigating whether the correct patches generated by LLMs are the result of memorization. The key challenge lies in the absence of ground truth for confirming memorization, leading to various ad-hoc methods designed for its detection. To address this challenge, we first propose a general framework that formalizes memorization detection as a general hypothesis testing problem, where existing approaches can be unified by defining a \textit{low-probability event} under the \textit{null hypothesis} that the data is not memorized. The occurrence of such an event leads to the rejection of the null hypothesis, indicating potential memorization.

Based on this framework, we design two specific methods (i.e., low-probability events) to detect potential memorization: 1) basic ground-truth matching, and 2) reassessment after substantial code mutation. We investigate the memorization issue in LLM-based program repair using two datasets: Defects4J, a widely used benchmark that is likely included in the training data, and GitBug-Java, a new dataset that is unlikely to be part of the training data. Our findings reveal that a significant portion of correct patches exactly match the ground truths in Defects4J (e.g., 78.83% and 87.42% on GPT-3.5 and CodeLlama-7b, respectively). Moreover, even after significant modifications to the buggy code, where the original repairs should not be generated, a considerable percentage of bugs (e.g., 81.82% on GPT-3.5 and 88.24% on CodeLlama-7b) continue to be fixed exactly as in the original bug fixes, indicating a high likelihood of memorization. Furthermore, we evaluate existing memorization detection methods and demonstrate their ineffectiveness in this context (e.g., most AUROCs are below 0.5). The theoretical analysis under our hypothesis testing framework shows that their defined events may not meet the requirements for being low-probability. The study highlights the critical need for more robust and rigorous evaluations in LLM-based software engineering research, ensuring a clear distinction between true problem-solving capabilities and mere memorization.

DOI

https://doi.org/10.1145/3729390

Jiaolong Kong

Singapore Management University

Xiaofei Xie

Singapore Management University

Singapore

Shangqing Liu

Nanyang Technological University

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 23 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

16:00 - 18:00	RepairsResearch Papers / Journal First at Andromeda Chair(s): Michael Pradel University of Stuttgart

16:00 20m Talk		HornBro: Homotopy-like Method for Automated Quantum Program Repair Research Papers Siwei Tan Zhejiang University, Liqiang Lu Zhejiang University, Debin Xiang Zhejiang University, Tianyao Chu Zhejiang University, Congliang Lang Zhejiang University, Jintao Chen Zhejiang University, Xing Hu Zhejiang University, Jianwei Yin Zhejiang University DOI
16:20 20m Talk		RePurr: Automated Repair of Block-Based Learners' Programs Research Papers Sebastian Schweikl University of Passau, Gordon Fraser University of Passau DOI
16:40 20m Talk		Demystifying Memorization in LLM-based Program Repair via a General Hypothesis Testing Framework Research Papers Jiaolong Kong Singapore Management University, Xiaofei Xie Singapore Management University, Shangqing Liu Nanyang Technological University DOI
17:00 20m Talk		IRepair: An Intent-Aware Approach to Repair Data-Driven Errors in Large Language Models Research Papers Sayem Mohammad Imtiaz Iowa State University, Astha Singh Dept. of Computer Science, Iowa State University, Fraol Batole Tulane University, Hridesh Rajan Tulane University DOI
17:20 20m Talk		Repairs and Breaks Prediction for Deep Neural Networks Journal First Yuta Ishimoto Kyushu University, Masanari Kondo Kyushu University, Lei Ma The University of Tokyo & University of Alberta, Naoyasu Ubayashi Waseda University, Yasutaka Kamei Kyushu University
17:40 20m Talk		Element-Based Automated DNN Repair with Fine-Tuned Masked Language Model Research Papers Xu Wang Beihang University; Zhongguancun Laboratory; Ministry of Education, Mingming Zhang Beihang University, Xiangxin Meng Beihang University, Jian Zhang Nanyang Technological University, Yang Liu Nanyang Technological University, Chunming Hu Beihang University DOI

Information for Participants

Mon 23 Jun 2025 16:00 - 18:00 at Andromeda - Repairs Chair(s): Michael Pradel

Info for room Andromeda:

Andromeda is located close to the restaurant and the bar, at the end of the corridor on the side of the bar.

From the registration desk, go towards the restaurant, turn left towards the bar, walk until the end of the corridor.