Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models (ICSE 2025 - Research Track)

Who

Aidan Z.H. Yang, Sophia Kolak, Vincent J. Hellendoorn, Ruben Martins, Claire Le Goues

Track

ICSE 2025 Research Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 2 May 2025 14:30 - 14:45 at 212 - AI for Analysis 5 Chair(s): Tien N. Nguyen

Abstract

Language models have improved by orders of magnitude with the recent emergence of Transformer-based Large Language Models (LLMs). LLMs have demonstrated their ability to generate “natural” code that is highly similar to code written by professional developers. One intermediate value an LLM can emit is entropy, which measures the naturalness of a token of code. We hypothesize that entropy can be used to improve the performance of Automated Program Repair (APR) tasks. While much progress has been made in Automated Program Repair (APR), fault localization techniques suffer from a lack of diversity in ranking scores, patch generation tools tend to be inefficient as all tests need to run before determining if a patch is likely to be correct, and patch ranking often suffers from the test-suite over-fitting problem. However, using an LLM directly for APR introduces concerns for training data leakage. In this work, we introduce a novel way of using the entropy of LLMs in combination with prior APR tools to improve all stages of APR. By using only the prefix and suffix context of a line or block of code to describe naturalness, we can use LLMs to localize faults and rank patches all while eliminating the dependency for test-suites. We show that entropy is highly complementary with prior fault localization tools. Our proposed method achieves a 108% top-1 score improvement over SBFL. When using entropy for patch ranking and classification, our proposed method can rank correct patches more effectively than state-of-the-art machine learning tools with an 49% improvement in top-1. Our work suggests that LLMs can be an effective addition to compliment prior APR tasks while minimizing both the test-suite over-fitting problem and the LLM data leakage problem.

Aidan Z.H. Yang

Carnegie Mellon University

United States

Sophia Kolak

Carnegie Mellon University

Vincent J. Hellendoorn

Carnegie Mellon University

United States

Ruben Martins

Carnegie Mellon University

United States

Claire Le Goues

Carnegie Mellon University

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 2 May
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	AI for Analysis 5Research Track / New Ideas and Emerging Results (NIER) at 212 Chair(s): Tien N. Nguyen University of Texas at Dallas

14:00 15m Talk		3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers Research Track Sarah Fakhoury Microsoft Research, Markus Kuppe Microsoft Research, Shuvendu K. Lahiri Microsoft Research, Tahina Ramananandro Microsoft Research, Nikhil Swamy Microsoft Research Pre-print
14:15 15m Talk		Aligning the Objective of LLM-based Program Repair Research Track Junjielong Xu The Chinese University of Hong Kong, Shenzhen, Ying Fu Chongqing University, Shin Hwei Tan Concordia University, Pinjia He Chinese University of Hong Kong, Shenzhen Pre-print
14:30 15m Talk		Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models Research Track Aidan Z.H. Yang Carnegie Mellon University, Sophia Kolak Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University, Ruben Martins Carnegie Mellon University, Claire Le Goues Carnegie Mellon University
14:45 15m Talk		The Fact Selection Problem in LLM-Based Program Repair Research Track Nikhil Parasaram Uber Amsterdam, Huijie Yan University College London, Boyu Yang University College London, Zineb Flahy University College London, Abriele Qudsi University College London, Damian Ziaber University College London, Earl T. Barr University College London, Sergey Mechtaev Peking University
15:00 15m Talk		Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models Research Track Zhijie Wang University of Alberta, Zijie Zhou University of Illinois Urbana-Champaign, Da Song University of Alberta, Yuheng Huang University of Alberta, Canada, Shengmai Chen Purdue University, Lei Ma The University of Tokyo & University of Alberta, Tianyi Zhang Purdue University Pre-print
15:15 15m Talk		Beyond Syntax: How Do LLMs Understand Code? New Ideas and Emerging Results (NIER) Marc North Durham University, Amir Atapour-Abarghouei Durham University, Nelly Bencomo Durham University