Impact of Code Language Models on Automated Program Repair (ICSE 2023 - Technical Track)

Who

Nan Jiang, Kevin Liu, Thibaud Lutellier, Lin Tan

Track

ICSE 2023 Technical Track

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 18 May 2023 13:45 - 14:00 at Meeting Room 102 - Program repair with and for AI Chair(s): Julia Rubin

Abstract

Automated program repair (APR) aims to help developers improve software reliability by generating patches for buggy programs. Although many code language models (CLM) are developed and effective in many software tasks such as code completion, there has been little comprehensive, in-depth work to evaluate CLMs’ fixing capabilities and to fine-tune CLMs for the APR task.

Firstly, this work is the first to evaluate eight CLMs on four APR benchmarks, which shows that surprisingly, the best CLM, as is, fixes 49% more bugs than the state-of-the-art APR techniques. Secondly, one of the four APR benchmarks was created by us in this paper to avoid data leaking for a fair evaluation. Thirdly, it is the first work to fine-tune CLMs with APR training data, which shows that fine-tuning brings 31%–1,267% improvement to CLMs and enables them to fix 56%–130% more bugs than existing APR techniques. Fourthly, this work studies the impact of buggy lines, showing that CLMs, as is, cannot make good use of the buggy lines to fix bugs, yet fine-tuned CLMs could potentially over-rely on buggy lines. Lastly, this work analyzes the size, time, and memory efficiency of different CLMs.

This work shows promising directions for the APR domain, such as fine-tuning CLMs with APR-specific designs. This paper also raises awareness of fair and comprehensive evaluations of CLMs and calls for clearer reporting of open-source repositories used in the pre-training data to address the data leaking problem.

Link to Preprint

https://arxiv.org/abs/2302.05020

Nan Jiang

Purdue University

Kevin Liu

Lynbrook High School

Thibaud Lutellier

University of Alberta

Lin Tan