ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil
Fri 17 Apr 2026 14:00 - 14:15 at Asia I - AI for Software Engineering 23 Chair(s): Wesley K.G. Assunção

Large language models (LLMs) have demonstrated their potential in performing complex software engineering (SE) tasks. Rigorous evaluation of LLMs and LLM-based tools requires massive, up-to-date data derived from real-world SE processes. Existing continuous integration & delivery (CI/CD) datasets, such as BugSwarm, provide evolving data mined from software build processes, including complete context of build failures. To harness the CI/CD data, we propose CI-Bench, a unified benchmarking framework designed to evaluate LLM-based program repair tools on software failures from CI/CD processes. CI-Bench retrieves data from the BugSwarm dataset, parses the build logs, and constructs appropriate prompts before invoking LLM-based program repair tools. Additionally, CI-Bench includes an executor that facilitates dynamic evaluation in the identical environment as the original build process. With CI-Bench, we evaluate three state-of-the-art LLM-based program repair tools, Agentless, SWE-Agent, and AutoCodeRover, on a code repair task involving 100 real-world CI/CD failures using GPT-4o, Claude-3.5-Sonnet, and Deepseek-V3 as foundation models. The evaluation shows that Agentless, SWE-Agent, and AutoCodeRover achieve success rates up to 32%, 36%, and 13% in generating correct patches, respectively. CI-Bench is available at https://github.com/bugswarm/ci-bench and a demo of the tool can be found at https://youtu.be/BM0K-P38MOg.

Fri 17 Apr

Displayed time zone: Brasilia, Distrito Federal, Brazil change

14:00 - 15:30
AI for Software Engineering 23Research Track / Demonstrations / Journal-first Papers at Asia I
Chair(s): Wesley K.G. Assunção North Carolina State University
14:00
15m
Talk
CI-Bench: A Framework for Evaluating Large Language Model Tools on CI Failures
Demonstrations
Raian Latif Nabil University of California, Davis, Hao-Nan Zhu University of California, Davis, Cindy Rubio-González University of California at Davis
14:15
15m
Talk
Assessing the Latent Automated Program Repair Capabilities of Large Language Models using Round-Trip Translation
Journal-first Papers
Fernando Vallecillos Ruiz Simula Research Laboratory, Anastasiia Grishina Simula Research Laboratory, Max Hort Simula Research Laboratory, Leon Moonen Simula Research Laboratory
Link to publication Pre-print
14:30
15m
Talk
XRFix: Exploring Performance Bug Repair of Extended Reality Applications with Large Language Models
Research Track
Jingwen Wu Department of Computer Science, Hong Kong Baptist University, Hanyang Guo School of Software Engineering, Sun Yat-sen University, Hong-Ning Dai Department of Computer Science, Hong Kong Baptist University, Xiapu Luo Hong Kong Polytechnic University
DOI Pre-print
14:45
15m
Talk
Synthetic Repo-level Bug Dataset for Training Automated Program Repair ModelsDistinguished Paper Award
Research Track
Minh V. T. Pham FPT Software AI Center, Huy N. Phan FPT Software AI Center, Hoang Nhat Phan Nanyang Technological University, Cuong Chi Le The University of Texas at Dallas, Tien N. Nguyen University of Texas at Dallas, Nghi D. Q. Bui Google Research
15:00
15m
Talk
PredicateFix: Repairing Static Analysis Alerts with Bridging Predicates
Research Track
Yuan-An Xiao Peking University, Weixuan Wang Peking University, Dong Liu Center Research Institute, ZTE Coporation, China, Junwei Zhou Center Research Institute, ZTE Coporation, China, Shengyu Cheng ZTE Corporation, Yingfei Xiong Peking University
Pre-print
15:15
15m
Talk
Input Reduction Enhanced LLM-based Program Repair
Research Track
Boyang Yang Yanshan University, Luyao Ren Peking University, Xin Yin Zhejiang University, Jiadong Ren Yanshan University, Haoye Tian Aalto University, Shunfu Jin Yanshan University
DOI Pre-print