ASE 2025
Sun 16 - Thu 20 November 2025 Seoul, South Korea

This program is tentative and subject to change.

Mon 17 Nov 2025 11:30 - 11:40 at Grand Hall 1 - Program Repair 1

Automated program repair (APR) techniques generate patches for fixing software bugs automatically. The aim of APR is to significantly reduce the manual effort required by developers to fix software bugs. However, previous studies have shown that APR techniques suffer from the overfitting problem. Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite. These are termed \emph{plausible} patches. Therefore, the patches generated by APR tools need to be validated by human programmers, which can be very costly, and prevents APR tool adoption in practice. Our work aims to minimize the number of plausible patches that programmers have to review, thereby reducing the time and effort required to find a correct patch.

To alleviate these issues, we present a light-weight patch post-processing technique, named XTESTCLUSTER, that aims to reduce the number of generated patches that a developer has to assess. Our technique clusters plausible repair patches exhibiting the same behavior (according to a given set of test suites), and provides the developer with fewer patches, each representative of a given cluster, thus ensuring that those patches exhibit different behavior. Our technique can be used not only when a single tool generates multiple plausible patches for a given bug, but also when different available APR tools are running (potentially in parallel) in order to increase the chance of finding a correct patch. In this way, developers will only need to examine one patch, representative of a given cluster, rather than all, possibly hundreds, of patches produced by APR tools.

Our approach presents two main novelties: First, it leverages the diversity of the behavior of the generated patches (and this diversity is not exposed by the developer-written test cases used to synthesize patches). In particular, our clustering approach XTESTCLUSTER exploits automatically generated test cases that enforce diverse behavior in addition to the existing test suite. Second, our approach has the main advantage that it does not involve code instrumentation (aside from patch application) nor an oracle or pre-existing dataset to learn fix patterns.
Moreover, XTESTCLUSTER is complementary to previous work on patch overfitting assessment, as it can apply different prioritization strategies to each cluster.

The output of XTESTCLUSTER provides developers and code reviewers with: 1) a way of reducing the number of patches to analyze, as they can focus on analyzing a sample of patches from each cluster, and 2) enriched information for each patch, including newly generated test cases, their outcomes, and the inputs that expose behavioral differences across alternative patches for the same bug. Such information supports reviewers in selecting the most appropriate patch to merge into the codebase.

We evaluate our approach on 902 patches (248 correct and 654 overfitted) for bugs from Defects4J data set, generated by 21 different APR tools. After removing duplicates, we used two automated test-case generation tools, EvoSuite and Randoop, to generate test cases for our patch set. Finally, we cluster patches based on test case results. To our knowledge, XTESTCLUSTER is the first approach to analyze together patches from multiple program repair approaches generated to fix a particular bug.

Our results show that XTESTCLUSTER is able to create at least two clusters for almost half of the bugs that have two or more different patches. By having patches clustered, XTESTCLUSTER is able to reduce a median of 50% of the number of patches to review and analyze. This reduction could help code reviewers (developers using automated repair tools or researchers evaluating patches) to reduce the time of patch evaluation.

We also analyze the assessment done by two state-of-the-art patch assessment approaches, ODS and Cache on the patches clustered by XTESTCLUSTER. The results show that XTESTCLUSTER can be used complementarily to those approaches and can help to detect false positives and false negatives.

This program is tentative and subject to change.

Mon 17 Nov

Displayed time zone: Seoul change

11:00 - 12:30
11:00
10m
Talk
Defects4C: Benchmarking Large Language Model Repair Capability with C/C++ Bugs
Research Papers
Jian Wang Nanyang Technological University, Xiaofei Xie Singapore Management University, Qiang Hu Tianjin University, Shangqing Liu Nanjing University, Jiongchi Yu Singapore Management University, Jiaolong Kong Singapore Management University, Yi Li Nanyang Technological University
11:10
10m
Talk
MORepair: Teaching LLMs to Repair Code via Multi-Objective Fine-Tuning
Journal-First Track
Boyang Yang Yanshan University; Beijing JudaoYouda Network Technology, Haoye Tian Aalto University, Jiadong Ren Yanshan University, Hongyu Zhang Chongqing University, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg, Claire Le Goues Carnegie Mellon University, Shunfu Jin Yanshan University
Link to publication DOI Pre-print
11:20
10m
Talk
When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair
Journal-First Track
Wenqiang LUO City University of Hong Kong, Jacky Keung City University of Hong Kong, Boyang Yang Yanshan University; Beijing JudaoYouda Network Technology, He Ye University College London (UCL), Claire Le Goues Carnegie Mellon University, Tegawendé F. Bissyandé University of Luxembourg, Haoye Tian Aalto University, Xuan-Bach D. Le University of Melbourne
11:30
10m
Talk
Test-based Patch Clustering for Automatically-Generated Patches Assessment
Journal-First Track
Matias Martinez Universitat Politècnica de Catalunya (UPC), Maria Kechagia National and Kapodistrian University of Athens, Anjana Perera Oracle Labs, Australia, Justyna Petke University College London, Federica Sarro University College London, Aldeida Aleti Monash University
11:40
10m
Talk
Hierarchical Knowledge Injection for Improving LLM-based Program Repair
Research Papers
Ramtin Ehsani Drexel University, Esteban Parra Rodriguez Belmont University, Sonia Haiduc Florida State University, Preetha Chatterjee Drexel University, USA
11:50
10m
Talk
Characterizing Multi-Hunk Patches: Divergence, Proximity, and LLM Repair Challenges
Research Papers
Noor Nashid University of British Columbia, Daniel Ding University of British Columbia, Keheliya Gallaba Centre for Software Excellence, Ahmed E. Hassan Queen’s University, Ali Mesbah University of British Columbia
12:00
10m
Talk
Reinforcement Learning for Mutation Operator Selection in Automated Program Repair
Journal-First Track
Carol Hanna University College London, Aymeric Blot University of Rennes, IRISA / INRIA, Justyna Petke University College London
12:10
10m
Talk
APRMCTS: Improving LLM-based Automated Program Repair with Iterative Tree Search
Research Papers
Haichuan Hu Nanjing University of Science and Technology, Congqing He School of Computer Sciences, Universiti Sains Malaysia, Xiaochen Xie Department of Management, Zhejiang University, China, Hao Zhang School of Computer Sciences, Universiti Sains Malaysia, Quanjun Zhang School of Computer Science and Engineering, Nanjing University of Science and Technology
12:20
10m
Talk
Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Repair
Research Papers
Kai Huang Technical University of Munich, Jian Zhang Nanyang Technological University, Xiaofei Xie Singapore Management University, Chunyang Chen TU Munich