Automated Program Repair in the Era of Large Pre-trained Language Models
Automated Program Repair (APR) aims to help developers automatically patch software bugs. However, current state-of-the-art traditional and learning-based APR techniques face the similar problem of limited patch variety, failing to fix complicated bugs. This is mainly due to the reliance on bug-fixing datasets to craft fix templates (traditional) or directly predict potential patches (learning-based). Large Pre-Trained Language Models (PLMs), usually trained using billions of text/code tokens, can potentially help avoid this issue. Very recently, researchers have directly leveraged PLMs for APR without relying on any bug-fixing datasets. Meanwhile, such existing work either failed to include state-of-the-art PLMs or was not evaluated on realistic datasets. Thus, the true power of modern PLMs on the important APR problem is yet to be revealed.
In this work, we perform the first extensive study on directly applying PLMs for APR. We select 9 recent state-of-the-art PLMs, including both generative and infilling models, ranging from 125M to 20B in size. We designed 3 different repair settings to evaluate the different ways we can use PLMs to generate patches: 1) generate the entire patch function, 2) fill in a chunk of code given the prefix and suffix 3) output a single line fix. We apply the PLMs under these repair settings on 5 datasets across 3 different languages and compare different PLMs in the number of bugs fixed, generation speed and compilation rate. We also compare the PLMs against recent state-of-the-art APR tools. Our study demonstrates that directly applying state-of-the-art PLMs can already substantially outperform all existing APR techniques on all our datasets. Among the studied PLMs, the scaling effect exists for APR where larger models tend to achieve better performance. Also, we show for the first time that suffix code after the buggy line (adopted in infilling-style APR) is important in not only generating more fixes but more patches with higher compilation rate. Besides patch generation, the PLMs consider correct patches to be more \textit{natural} than other ones, and can even be leveraged for effective patch ranking or patch correctness checking. Lastly, we show that PLM-based APR can be further substantially boosted via: 1) increasing the sample size, and 2) incorporating fix template information.
Thu 18 MayDisplayed time zone: Hobart change
13:45 - 15:15 | Program repair with and for AITechnical Track / Journal-First Papers / DEMO - Demonstrations at Meeting Room 102 Chair(s): Julia Rubin University of British Columbia, Canada | ||
13:45 15mTalk | Impact of Code Language Models on Automated Program Repair Technical Track Nan Jiang Purdue University, Kevin Liu Lynbrook High School, Thibaud Lutellier University of Alberta, Lin Tan Purdue University Pre-print | ||
14:00 15mTalk | Tare: Type-Aware Neural Program Repair Technical Track Qihao Zhu Peking University, Zeyu Sun Zhongguancun Laboratory, Wenjie Zhang Peking University, Yingfei Xiong Peking University, Lu Zhang Peking University | ||
14:15 15mTalk | Template-based Neural Program Repair Technical Track Xiangxin Meng Beihang University, Beijing, China, Xu Wang Beihang University, Hongyu Zhang The University of Newcastle, Hailong Sun School of Computer Science and Engineering, Beihang University, Beijing,China, Xudong Liu Beihang University, Chunming Hu Beihang University Pre-print | ||
14:30 15mTalk | Automated Repair of Programs from Large Language Models Technical Track Zhiyu Fan National University of Singapore, Singapore, Xiang Gao Beihang University, China, Martin Mirchev National University of Singapore, Abhik Roychoudhury National University of Singapore, Shin Hwei Tan Southern University of Science and Technology | ||
14:45 15mTalk | Automated Program Repair in the Era of Large Pre-trained Language Models Technical Track Chunqiu Steven Xia University of Illinois at Urbana-Champaign, Yuxiang Wei University of Illinois at Urbana-Champaign, Lingming Zhang University of Illinois at Urbana-Champaign | ||
15:00 7mTalk | AIREPAIR: A Repair Platform for Neural Networks DEMO - Demonstrations Xidan Song Department of Computer Science, University of Manchester, UK, Youcheng Sun The University of Manchester, Mustafa A. Mustafa Department of Computer Science, University of Manchester, UK, imec-COSIC, KU Leuven, Belgium, Lucas C. Cordeiro University of Manchester | ||
15:07 7mTalk | Arachne: Search Based Repair of Deep Neural Networks Journal-First Papers Link to publication DOI Pre-print |