Autonomous program improvement typically involves automatically producing bug fixes and feature additions. Such program improvement can be accomplished by a combination of large language model (LLM) and program analysis capabilities, in the form of an LLM agent. Since program repair or program improvement typically requires a specification of intended behavior - specification inference can be useful for producing high quality program patches. In this work, we examine efficient and low-cost workflows for iterative specification inference within an LLM agent. Given a GitHub issue to be resolved in a software project, our goal is to conduct iterative code search accompanied by specification inference - thereby inferring intent from both the project structure and behavior. The intent thus captured is examined by a reviewer agent with the goal of vetting the patches as well as providing a measure of confidence in the vetted patches. Our approach SpecRover is built on the open-source LLM agent AutoCodeRover. In an evaluation on the full SWE-Bench consisting of 2294 GitHub issues, it shows more than 50% improvement in efficacy over AutoCodeRover. Compared to the open-source agents available, our work shows modest cost ($0.65 per issue) in resolving an average GitHub issue in SWE-Bench lite. The production of explanation by SpecRover allows for a better “signal” to be given to the developer, on when the suggested patches can be accepted with confidence. SpecRover also seeks to demonstrate the continued importance of specification inference in automated program repair, even as program repair technologies enter the LLM era.
Wed 30 AprDisplayed time zone: Eastern Time (US & Canada) change
16:00 - 17:30 | AI for Program Comprehension 1Research Track at 213 Chair(s): Yintong Huo Singapore Management University, Singapore | ||
16:00 15mTalk | ADAMAS: Adaptive Domain-Aware Performance Anomaly Detection in Cloud Service Systems Research Track Wenwei Gu The Chinese University of Hong Kong, Jiazhen Gu Chinese University of Hong Kong, Jinyang Liu Chinese University of Hong Kong, Zhuangbin Chen Sun Yat-sen University, Jianping Zhang The Chinese University of Hong Kong, Jinxi Kuang The Chinese University of Hong Kong, Cong Feng Huawei Cloud Computing Technology, Yongqiang Yang Huawei Cloud Computing Technology, Michael Lyu The Chinese University of Hong Kong | ||
16:15 15mTalk | LibreLog: Accurate and Efficient Unsupervised Log Parsing Using Open-Source Large Language Models Research Track Zeyang Ma Concordia University, Dong Jae Kim DePaul University, Tse-Hsun (Peter) Chen Concordia University | ||
16:30 15mTalk | Model Editing for LLMs4Code: How Far are We? Research Track Xiaopeng Li National University of Defense Technology, Shangwen Wang National University of Defense Technology, Shasha Li National University of Defense Technology, Jun Ma National University of Defense Technology, Jie Yu National University of Defense Technology, Xiaodong Liu National University of Defense Technology, Jing Wang National University of Defense Technology, Bin Ji National University of Defense Technology, Weimin Zhang National University of Defense Technology Pre-print | ||
16:45 15mTalk | Software Model Evolution with Large Language Models: Experiments on Simulated, Public, and Industrial Datasets Research Track Christof Tinnes Saarland University, Alisa Carla Welter Saarland University, Sven Apel Saarland University Pre-print | ||
17:00 15mTalk | SpecRover: Code Intent Extraction via LLMs Research Track Haifeng Ruan National University of Singapore, Yuntong Zhang National University of Singapore, Abhik Roychoudhury National University of Singapore | ||
17:15 15mTalk | Unleashing the True Potential of Semantic-based Log Parsing with Pre-trained Language Models Research Track |