Automated program repair (APR) aims to help developers improve software reliability by generating patches for buggy programs. Although many code language models (CLM) are developed and effective in many software tasks such as code completion, there has been little comprehensive, in-depth work to evaluate CLMs’ fixing capabilities and to fine-tune CLMs for the APR task.
Firstly, this work is the first to evaluate eight CLMs on four APR benchmarks, which shows that surprisingly, the best CLM, as is, fixes 49% more bugs than the state-of-the-art APR techniques. Secondly, one of the four APR benchmarks was created by us in this paper to avoid data leaking for a fair evaluation. Thirdly, it is the first work to fine-tune CLMs with APR training data, which shows that fine-tuning brings 31%–1,267% improvement to CLMs and enables them to fix 56%–130% more bugs than existing APR techniques. Fourthly, this work studies the impact of buggy lines, showing that CLMs, as is, cannot make good use of the buggy lines to fix bugs, yet fine-tuned CLMs could potentially over-rely on buggy lines. Lastly, this work analyzes the size, time, and memory efficiency of different CLMs.
This work shows promising directions for the APR domain, such as fine-tuning CLMs with APR-specific designs. This paper also raises awareness of fair and comprehensive evaluations of CLMs and calls for clearer reporting of open-source repositories used in the pre-training data to address the data leaking problem.
Thu 18 MayDisplayed time zone: Hobart change
13:45 - 15:15 | Program repair with and for AITechnical Track / Journal-First Papers / DEMO - Demonstrations at Meeting Room 102 Chair(s): Julia Rubin University of British Columbia, Canada | ||
13:45 15mTalk | Impact of Code Language Models on Automated Program Repair Technical Track Nan Jiang Purdue University, Kevin Liu Lynbrook High School, Thibaud Lutellier University of Alberta, Lin Tan Purdue University Pre-print | ||
14:00 15mTalk | Tare: Type-Aware Neural Program Repair Technical Track Qihao Zhu Peking University, Zeyu Sun Zhongguancun Laboratory, Wenjie Zhang Peking University, Yingfei Xiong Peking University, Lu Zhang Peking University | ||
14:15 15mTalk | Template-based Neural Program Repair Technical Track Xiangxin Meng Beihang University, Beijing, China, Xu Wang Beihang University, Hongyu Zhang The University of Newcastle, Hailong Sun School of Computer Science and Engineering, Beihang University, Beijing,China, Xudong Liu Beihang University, Chunming Hu Beihang University Pre-print | ||
14:30 15mTalk | Automated Repair of Programs from Large Language Models Technical Track Zhiyu Fan National University of Singapore, Singapore, Xiang Gao Beihang University, China, Martin Mirchev National University of Singapore, Abhik Roychoudhury National University of Singapore, Shin Hwei Tan Southern University of Science and Technology | ||
14:45 15mTalk | Automated Program Repair in the Era of Large Pre-trained Language Models Technical Track Chunqiu Steven Xia University of Illinois at Urbana-Champaign, Yuxiang Wei University of Illinois at Urbana-Champaign, Lingming Zhang University of Illinois at Urbana-Champaign | ||
15:00 7mTalk | AIREPAIR: A Repair Platform for Neural Networks DEMO - Demonstrations Xidan Song Department of Computer Science, University of Manchester, UK, Youcheng Sun The University of Manchester, Mustafa A. Mustafa Department of Computer Science, University of Manchester, UK, imec-COSIC, KU Leuven, Belgium, Lucas C. Cordeiro University of Manchester | ||
15:07 7mTalk | Arachne: Search Based Repair of Deep Neural Networks Journal-First Papers Link to publication DOI Pre-print |