FixEval: Execution-based Evaluation of Program Fixes for Programming Problems
The increasing complexity of software has led to a drastic rise in time and costs for identifying and fixing bugs. Various approaches are explored in the literature to generate fixes for buggy code automatically. However, few tools and datasets are available to evaluate model-generated fixes effectively due to the large combinatorial space of possible fixes for a particular bug. In this work, we introduce FixEval, a benchmark comprising of buggy code submissions to competitive programming problems and their respective fixes. FixEval is composed of a rich test suite to evaluate the correctness of model-generated program fixes and assess further information regarding time and memory constraints and acceptance based on a verdict. We consider two Transformer language models pretrained on programming languages as our baseline and compare them using match-based and execution-based evaluation metrics. Our experiments show that match-based metrics do not reflect model-generated program fixes accurately. At the same time, execution-based methods evaluate programs through all cases and scenarios designed explicitly for that solution. Therefore, we believe FixEval provides a step towards real-world automatic bug fixing and model-generated code evaluation.
Tue 16 MayDisplayed time zone: Hobart change
13:45 - 15:15 | |||
13:45 15mTalk | Program Repair Competition APR Ridwan Salihin Shariffdeen National University of Singapore, Martin Mirchev National University of Singapore, Abhik Roychoudhury National University of Singapore | ||
14:00 75mPanel | Panel Discussion: Future of APR: Challenges and directions APR Manish Motwani Georgia Institute of Technology, Xuan Bach D. Le The University of Melbourne, Abhik Roychoudhury National University of Singapore, Yingfei Xiong Peking University, Lingming Zhang University of Illinois at Urbana-Champaign | ||
15:15 20mTalk | FixEval: Execution-based Evaluation of Program Fixes for Programming Problems APR Md Mahim Anjum Haque Virginia Tech, Wasi Uddin Ahmad University of California, Los Angeles, Ismini Lourentzou Virginia Tech, Chris Brown Virginia Tech | ||
15:35 15mTalk | Beyond Code Generation: The Need for Type-Aware Language Models APR Francisco Ribeiro HASLab/INESC TEC & Universidade do Minho, José Nuno Macedo University of Minho, Kanae Tsushima National Institute of Informatics, Japan |