Evaluating Agent-based Program Repair at Google (ICSE 2025 - Software Engineering in Practice (SEIP))

Who

Patrick Rondon, Renyao Wei, José Pablo Cambronero, Jürgen Cito, Aaron Sun, Siddhant Sanyam, Michele Tufano, Satish Chandra

Track

ICSE 2025 SE In Practice (SEIP)

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 2 May 2025 11:15 - 11:30 at 212 - AI for Analysis 4 Chair(s): Maliheh Izadi, Ali Al-Kaswan, Jonathan Katzy

Abstract

Agent-based program repair offers the promise of automatically resolving complex bugs end-to-end by combining the planning, tool usage, and code-generating abilities of modern LLMs. Recent work has explored the use of agent-based repair approaches on the popular open-source SWE-Bench, a collection of bugs (and patches) from popular GitHub-hosted Python projects. In addition, various agentic approaches such as SWE-Agent have been proposed to solve bugs in this benchmark. This paper explores the opportunity of using a similar agentic approach to address bugs in an enterprise-scale context. We perform a systematic comparison of bugs in SWE-Bench and those found in Google’s issue tracking system and show that they have different distributions in terms of language diversity, size and spread of changes, and ease of localization.

Next, we implement Passerine, an agent similar in spirit to SWE-Agent that can work within Google’s environment and produce patches for bugs in Google’s code repository. To evaluate Passerine, we curate an evaluation set of 182 bugs, spanning human-reported (82) and machine-reported bugs (100) from Google’s issue tracking system. We show that with 20 trajectory samples Passerine can produce a plausible patch for 70% of machine-reported and 14.6% of human-reported bugs in our evaluation set. After manual examination, we found that 42% of machine-reported bugs and 13.4% of human-reported bugs have at least one patch that is semantically equivalent to the ground-truth patch. This establishes a lower bound on performance that suggests agent-based APR holds promise for large-scale enterprise bug repair.

Patrick Rondon

Google

Renyao Wei

Google

José Pablo Cambronero

Google, USA

United States

Jürgen Cito

TU Wien

Austria

Aaron Sun

Google

Siddhant Sanyam

Google

Michele Tufano

Google

United States

Satish Chandra

Google, Inc

United States

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 2 May
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	AI for Analysis 4Research Track / New Ideas and Emerging Results (NIER) / SE In Practice (SEIP) at 212 Chair(s): Maliheh Izadi Delft University of Technology, Ali Al-Kaswan Delft University of Technology, Netherlands, Jonathan Katzy Delft University of Technology

11:00 15m Talk		RepairAgent: An Autonomous, LLM-Based Agent for Program Repair Research Track Islem BOUZENIA University of Stuttgart, Prem Devanbu University of California at Davis, Michael Pradel University of Stuttgart Pre-print
11:15 15m Talk		Evaluating Agent-based Program Repair at Google SE In Practice (SEIP) Patrick Rondon Google, Renyao Wei Google, José Pablo Cambronero Google, USA, Jürgen Cito TU Wien, Aaron Sun Google, Siddhant Sanyam Google, Michele Tufano Google, Satish Chandra Google, Inc
11:30 15m Talk		Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset SE In Practice (SEIP) Mohammad Saiful Islam Toronto Metropolitan University, Toronto, Canada, Mohamed Sami Rakha Toronto Metropolitan University, Toronto, Canada, William Pourmajidi Toronto Metropolitan University, Toronto, Canada, Janakan Sivaloganathan Toronto Metropolitan University, Toronto, Canada, John Steinbacher IBM, Andriy Miranskyy Toronto Metropolitan University (formerly Ryerson University) Pre-print
11:45 15m Talk		Crash Report Prioritization for Large-Scale Scheduled Launches SE In Practice (SEIP) Nimmi Rashinika Weeraddana University of Waterloo, Sarra Habchi Ubisoft Montréal, Shane McIntosh University of Waterloo
12:00 15m Talk		LogLM: From Task-based to Instruction-based Automated Log Analysis SE In Practice (SEIP) Yilun Liu Huawei co. LTD, Yuhe Ji Huawei co. LTD, Shimin Tao University of Science and Technology of China; Huawei co. LTD, Minggui He Huawei co. LTD, Weibin Meng Huawei co. LTD, Shenglin Zhang Nankai University, Yongqian Sun Nankai University, Yuming Xie Huawei co. LTD, Boxing Chen Huawei Canada, Hao Yang Huawei co. LTD Pre-print
12:15 7m Talk		Using ML filters to help automated vulnerability repairs: when it helps and when it doesn’tSecurity New Ideas and Emerging Results (NIER) Maria Camporese University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam Pre-print