TCSE logo 
 Sigsoft logo
Sustainability badge
Sat 3 May 2025 10:20 - 10:30 at 214 - Opening / Keynote 1 / Paper Session 1 Chair(s): Zijian Wang

AI-driven program repair uses AI models to repair buggy software by producing patches. Rapid advancements in frontier models surely impact performance on the program repair task. Yet, there is a lack of frequent and standardized evaluations to actually understand the strengths and weaknesses of models. To that end, we propose RepairBench, a novel leaderboard for AI-driven program repair. The key characteristics of RepairBench are: 1) it is execution-based: all patches are compiled and executed against a test suite, 2) it assesses frontier models in a frequent and standardized way. RepairBench leverages two high-quality benchmarks, Defects4J and GitBug-Java, to evaluate frontier models only against real-world program repair tasks. At the time of writing, RepairBench shows that \textit{claude-3-5-sonnet-20241022} is the best model for program repair, and \textit{qwen-2.5-coder-32b-instruct} the cheapest while maintaining good performance. We publicly release the evaluation framework of RepairBench as well as all patches generated in the course of the evaluation.

Sat 3 May

Displayed time zone: Eastern Time (US & Canada) change

09:00 - 10:30
Opening / Keynote 1 / Paper Session 1LLM4Code at 214
Chair(s): Zijian Wang AWS AI Labs
09:00
10m
Day opening
Opening
LLM4Code
Lingming Zhang University of Illinois at Urbana-Champaign, Prem Devanbu University of California at Davis, Zijian Wang AWS AI Labs
09:10
60m
Keynote
Keynote 1: Building the Hybrid Human-AI Developer: From Code Completion to Agents (zoom talk)
LLM4Code
10:10
10m
Talk
Are Large Language Models Memorizing Bug Benchmarks?
LLM4Code
Daniel Ramos Carnegie Mellon University, Claudia Mamede Carnegie Mellon University, Kush Jain Carnegie Mellon University, Paulo Canelas Carnegie Mellon University, Catarina Gamboa Carnegie Mellon University, Claire Le Goues Carnegie Mellon University
10:20
10m
Talk
RepairBench: Leaderboard of Frontier Models for Program Repair
LLM4Code
André Silva KTH Royal Institute of Technology, Martin Monperrus KTH Royal Institute of Technology
:
:
:
: