Extracting Fix Ingredients using Language Models (FORGE 2025 - Research Papers)

Who

Julian Prenner, Romain Robbes

Track

FORGE 2025 Research Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 28 Apr 2025 11:00 - 11:12 at 207 - Session4: Human-AI Collaboration & Legal Aspects of using FM Chair(s): Zhenhao Li

Abstract

Deep learning and language models are increasingly dominating automated program repair research. While previous generate-and-validate approaches were able to find and use fix ingredients on a file or even project level, neural language models are limited to the code that fits their input window. In this work we investigate how important identifier ingredients are in neural program repair and present ScanFix, an approach that leverages an additional scanner model to extract identifiers from a bug’s file and potentially project-level context. We find that lack of knowledge of far-away identifiers is an important cause of failed repairs. Augmenting repair model input with scanner-extracted identifiers yields relative improvements of up to 31%. However, ScanFix is outperformed by a model with a large input window (> 5k tokens). When passing ingredients from the ground-truth fix, improvements are even higher. This shows that, with refined extraction techniques, ingredient scanning, similar to fix candidate ranking, could have the potential to become an important “subtask” of future automated repair systems. At the same time, it also demonstrates that this idea is subject to Sutton’s bitter lesson and may be rendered unnecessary by new code models with ever-increasing context windows.

Julian Prenner

Free University of Bozen-Bolzano

Romain Robbes

CNRS, LaBRI, University of Bordeaux

France