ICSE 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada
Tue 29 Apr 2025 14:30 - 14:50 at 210 - APR Session 3 Chair(s): Chao Peng

The performance of a machine learning system is not only determined by the model but also, to a substantial degree, by the data it is trained on. With the increasing use of machine learning, issues related to data quality have become a concern also in automated program repair research. In this position paper, we report some of the data-related issues we have come across when working with several large APR datasets and benchmarks, including, for instance, duplicates or “bogus bugs”. We briefly discuss the potential impact of these problems on repair performance and propose possible remedies. We believe that more data-focused approaches could improve the performance and robustness of current and future APR systems.

Tue 29 Apr

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
APR Session 3APR at 210
Chair(s): Chao Peng ByteDance
14:00
30m
Other
Discussion
APR
Chao Peng ByteDance
14:30
20m
Talk
Bogus Bugs, Duplicates, and Revealing Comments: Data Quality Issues in NPR
APR
Julian Prenner Free University of Bozen-Bolzano, Romain Robbes Univ. Bordeaux, CRNS
14:50
20m
Talk
LLM-Based Repair of C++ Implicit Data Loss Compiler Warnings: An Industrial Case Study
APR
Chansong You SAP Labs Korea, Hyun Deok Choi SAP Labs Korea, Jingun Hong SAP Labs
15:10
20m
Talk
Scholia - An XAI Framework for APR
APR
Nethum Lamahewage University of Moratuwa, Sri Lanka, Nimantha Cooray University of Moratuwa, Sri Lanka, Ridwan Salihin Shariffdeen National University of Singapore, Sandareka Wickramanayake University of Moratuwa, Sri Lanka, Nisansa de Silva University of Moratuwa, Sri Lanka