CoditT5: Pretraining for Source Code and Natural Language Editing
Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel pretraining objective which explicitly models edits and use it to build CoditT5, a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments. We fine-tune it on various downstream editing tasks, including comment updating, bug fixing, and automated code review. By outperforming standard generation-based models, we demonstrate the generalizability of our approach and its suitability for editing tasks. We also show how a standard generation model and our edit-based model can complement one another through simple reranking strategies, with which we achieve state-of-the-art performance for the three downstream editing tasks.
Tue 11 OctDisplayed time zone: Eastern Time (US & Canada) change
10:30 - 12:30 | Technical Session 2 - Debugging and TroubleshootingResearch Papers / Industry Showcase / Late Breaking Results at Banquet A Chair(s): Andrew Begel Carnegie Mellon University, Software and Societal Systems Department | ||
10:30 20mResearch paper | Call Me Maybe: Using NLP to Automatically Generate Unit Test Cases Respecting Temporal Constraints Research Papers Arianna Blasi Meta; prev. Università della Svizzera italiana, Alessandra Gorla IMDEA Software Institute, Michael D. Ernst University of Washington, Mauro Pezze USI Lugano; Schaffhausen Institute of Technology | ||
10:50 20mResearch paper | CoditT5: Pretraining for Source Code and Natural Language Editing Research Papers Jiyang Zhang University of Texas at Austin, Sheena Panthaplackel UT Austin, Pengyu Nie University of Texas at Austin, Junyi Jessy Li University of Texas at Austin, USA, Milos Gligoric University of Texas at Austin Pre-print | ||
11:10 20mIndustry talk | Automated Identification of Security-Relevant Configuration Settings Using NLP Industry Showcase Patrick Stöckle Technical University of Munich (TUM), Theresa Wasserer Technical University of Munich, Bernd Grobauer Siemens AG, Alexander Pretschner TU Munich Pre-print | ||
11:30 20mResearch paper | Is this Change the Answer to that Problem? Correlating Descriptions of Bug and Code Changes for Evaluating Patch Correctness Research Papers Haoye Tian University of Luxembourg, Xunzhu Tang University of Luxembourg, Andrew Habib SnT, University of Luxembourg, Shangwen Wang National University of Defense Technology, Kui Liu Huawei Software Engineering Application Technology Lab, Xin Xia Huawei Software Engineering Application Technology Lab, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé SnT, University of Luxembourg Pre-print | ||
11:50 10mPaper | A real-world case study for automated ticket team assignment using natural language processing and explainable modelsVirtual Late Breaking Results |