FixEval: Execution-based Evaluation of Program Fixes for Programming Problems (APR 2023)

Who

Md Mahim Anjum Haque, Wasi Uddin Ahmad, Ismini Lourentzou, Chris Brown

Track

APR 2023

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 16 May 2023 15:15 - 15:35 at Meeting Room 104 - Afternoon session1

Abstract

The increasing complexity of software has led to a drastic rise in time and costs for identifying and fixing bugs. Various approaches are explored in the literature to generate fixes for buggy code automatically. However, few tools and datasets are available to evaluate model-generated fixes effectively due to the large combinatorial space of possible fixes for a particular bug. In this work, we introduce FixEval, a benchmark comprising of buggy code submissions to competitive programming problems and their respective fixes. FixEval is composed of a rich test suite to evaluate the correctness of model-generated program fixes and assess further information regarding time and memory constraints and acceptance based on a verdict. We consider two Transformer language models pretrained on programming languages as our baseline and compare them using match-based and execution-based evaluation metrics. Our experiments show that match-based metrics do not reflect model-generated program fixes accurately. At the same time, execution-based methods evaluate programs through all cases and scenarios designed explicitly for that solution. Therefore, we believe FixEval provides a step towards real-world automatic bug fixing and model-generated code evaluation.

Md Mahim Anjum Haque

Virginia Tech

United States

Wasi Uddin Ahmad

University of California, Los Angeles

Ismini Lourentzou

Virginia Tech

Chris Brown

Virginia Tech

United States

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 16 May
Displayed time zone: Hobart change

13:45 - 15:15	Afternoon session1APR at Meeting Room 104

13:45 15m Talk		Program Repair Competition APR Ridwan Salihin Shariffdeen National University of Singapore, Martin Mirchev National University of Singapore, Abhik Roychoudhury National University of Singapore
14:00 75m Panel		Panel Discussion: Future of APR: Challenges and directions APR Manish Motwani Georgia Institute of Technology, Xuan Bach D. Le The University of Melbourne, Abhik Roychoudhury National University of Singapore, Yingfei Xiong Peking University, Lingming Zhang University of Illinois at Urbana-Champaign
15:15 20m Talk		FixEval: Execution-based Evaluation of Program Fixes for Programming Problems APR Md Mahim Anjum Haque Virginia Tech, Wasi Uddin Ahmad University of California, Los Angeles, Ismini Lourentzou Virginia Tech, Chris Brown Virginia Tech
15:35 15m Talk		Beyond Code Generation: The Need for Type-Aware Language Models APR Francisco Ribeiro HASLab/INESC TEC & Universidade do Minho, José Nuno Macedo University of Minho, Kanae Tsushima National Institute of Informatics, Japan