Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction objective of current infilling-style methods, which impedes LLMs from fully leveraging pre-trained knowledge for program repair. In addition, while some LLMs can locate and repair bugs in certain functions using the related artifacts (e.g., test cases), existing methods still depend on statement-level fault localization methods to provide a list of buggy hunks for repair. This restriction hinders LLMs from exploring potential patches beyond the given locations.
In this paper, we investigate a new approach to adapt LLMs to program repair. Our core insight is that LLM’s APR capability can be greatly improved by simply aligning the output to their training objective and allowing them to refine the whole program without first identifying faulty statements. Based on this insight, we designed D4C, a straightforward prompting framework for APR. D4C can repair 180 bugs correctly in Defects4J, with each patch being sampled only 10 times. This surpasses the SOTA APR methods with perfect fault localization by 10% and reduces the patch sampling number by 90%. Our findings reveal that (1) objective alignment is crucial for fully exploiting LLM’s pre-trained capability, and (2) replacing the traditional localize-buggy-hunks-then-repair workflow with direct debugging is more effective for LLM-based APR methods. Thus, we believe this paper introduces a new mindset for harnessing LLMs in APR.
Fri 2 MayDisplayed time zone: Eastern Time (US & Canada) change
14:00 - 15:30 | AI for Analysis 5Research Track / New Ideas and Emerging Results (NIER) at 212 Chair(s): Tien N. Nguyen University of Texas at Dallas | ||
14:00 15mTalk | 3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers Research Track Sarah Fakhoury Microsoft Research, Markus Kuppe Microsoft Research, Shuvendu K. Lahiri Microsoft Research, Tahina Ramananandro Microsoft Research, Nikhil Swamy Microsoft Research Pre-print | ||
14:15 15mTalk | Aligning the Objective of LLM-based Program Repair Research Track Junjielong Xu The Chinese University of Hong Kong, Shenzhen, Ying Fu Chongqing University, Shin Hwei Tan Concordia University, Pinjia He Chinese University of Hong Kong, Shenzhen Pre-print | ||
14:30 15mTalk | Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models Research Track Aidan Z.H. Yang Carnegie Mellon University, Sophia Kolak Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University, Ruben Martins Carnegie Mellon University, Claire Le Goues Carnegie Mellon University | ||
14:45 15mTalk | The Fact Selection Problem in LLM-Based Program Repair Research Track Nikhil Parasaram Uber Amsterdam, Huijie Yan University College London, Boyu Yang University College London, Zineb Flahy University College London, Abriele Qudsi University College London, Damian Ziaber University College London, Earl T. Barr University College London, Sergey Mechtaev Peking University | ||
15:00 15mTalk | Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models Research Track Zhijie Wang University of Alberta, Zijie Zhou University of Illinois Urbana-Champaign, Da Song University of Alberta, Yuheng Huang University of Alberta, Canada, Shengmai Chen Purdue University, Lei Ma The University of Tokyo & University of Alberta, Tianyi Zhang Purdue University Pre-print | ||
15:15 15mTalk | Beyond Syntax: How Do LLMs Understand Code? New Ideas and Emerging Results (NIER) Marc North Durham University, Amir Atapour-Abarghouei Durham University, Nelly Bencomo Durham University |