TCSE logo 
 Sigsoft logo
Sustainability badge
Fri 2 May 2025 14:15 - 14:30 at 212 - AI for Analysis 5 Chair(s): Tien N. Nguyen

Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction objective of current infilling-style methods, which impedes LLMs from fully leveraging pre-trained knowledge for program repair. In addition, while some LLMs can locate and repair bugs in certain functions using the related artifacts (e.g., test cases), existing methods still depend on statement-level fault localization methods to provide a list of buggy hunks for repair. This restriction hinders LLMs from exploring potential patches beyond the given locations.

In this paper, we investigate a new approach to adapt LLMs to program repair. Our core insight is that LLM’s APR capability can be greatly improved by simply aligning the output to their training objective and allowing them to refine the whole program without first identifying faulty statements. Based on this insight, we designed D4C, a straightforward prompting framework for APR. D4C can repair 180 bugs correctly in Defects4J, with each patch being sampled only 10 times. This surpasses the SOTA APR methods with perfect fault localization by 10% and reduces the patch sampling number by 90%. Our findings reveal that (1) objective alignment is crucial for fully exploiting LLM’s pre-trained capability, and (2) replacing the traditional localize-buggy-hunks-then-repair workflow with direct debugging is more effective for LLM-based APR methods. Thus, we believe this paper introduces a new mindset for harnessing LLMs in APR.

Fri 2 May

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
AI for Analysis 5Research Track / New Ideas and Emerging Results (NIER) at 212
Chair(s): Tien N. Nguyen University of Texas at Dallas
14:00
15m
Talk
3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers
Research Track
Sarah Fakhoury Microsoft Research, Markus Kuppe Microsoft Research, Shuvendu K. Lahiri Microsoft Research, Tahina Ramananandro Microsoft Research, Nikhil Swamy Microsoft Research
Pre-print
14:15
15m
Talk
Aligning the Objective of LLM-based Program Repair
Research Track
Junjielong Xu The Chinese University of Hong Kong, Shenzhen, Ying Fu Chongqing University, Shin Hwei Tan Concordia University, Pinjia He Chinese University of Hong Kong, Shenzhen
Pre-print
14:30
15m
Talk
Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language ModelsArtifact-Available
Research Track
Aidan Z.H. Yang Carnegie Mellon University, Sophia Kolak Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University, Ruben Martins Carnegie Mellon University, Claire Le Goues Carnegie Mellon University
14:45
15m
Talk
The Fact Selection Problem in LLM-Based Program Repair
Research Track
Nikhil Parasaram Uber Amsterdam, Huijie Yan University College London, Boyu Yang University College London, Zineb Flahy University College London, Abriele Qudsi University College London, Damian Ziaber University College London, Earl T. Barr University College London, Sergey Mechtaev Peking University
15:00
15m
Talk
Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models
Research Track
Zhijie Wang University of Alberta, Zijie Zhou University of Illinois Urbana-Champaign, Da Song University of Alberta, Yuheng Huang University of Alberta, Canada, Shengmai Chen Purdue University, Lei Ma The University of Tokyo & University of Alberta, Tianyi Zhang Purdue University
Pre-print
15:15
15m
Talk
Beyond Syntax: How Do LLMs Understand Code?
New Ideas and Emerging Results (NIER)
Marc North Durham University, Amir Atapour-Abarghouei Durham University, Nelly Bencomo Durham University
:
:
:
: