TCSE logo 
 Sigsoft logo
Sustainability badge

This program is tentative and subject to change.

Fri 2 May 2025 15:00 - 15:15 at 214 - AI for Testing and QA 6

LLM-based automated program repair methods have attracted significant attention for their state-of-the-art performance. However, they were primarily evaluated on a few well-known datasets like Defects4J, raising questions about their effectiveness on new datasets. In this study, we evaluate 11 top-performing LLMs on DEFECTS4J-TRANS, a new dataset derived from transforming Defects4J while maintaining fault semantics. Results from experiments on both Defects4J and DEFECTS4J-TRANS show that all studied LLMs have limited generalizability in APR tasks, with average correct and plausible patches decreasing by 49.48% and 42.90%, respectively, on DEFECTS4J-TRANS. Further investigation into incorporating additional repair-relevant information in repair prompts reveals that, although this information signigicantly enhances the LLMs’ capabilities (increasing correct and plausible patches by up to 136.67% and 121.82%, respectively), performance still falls short of their original dataset results. This indicates that prompt engineering alone is insufficient to substantially enhance LLMs’ repair capabilities. According our study, we also offer several recommendations for future research.

This program is tentative and subject to change.

Fri 2 May

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
14:00
15m
Talk
Treefix: Enabling Execution with a Tree of Prefixes
Research Track
Beatriz Souza Universität Stuttgart, Michael Pradel University of Stuttgart
14:15
15m
Talk
Assessing Evaluation Metrics for Neural Test Oracle Generation
Journal-first Papers
Jiho Shin York University, Hadi Hemmati York University, Moshi Wei York University, Song Wang York University
14:30
15m
Talk
Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy Measurement
Journal-first Papers
Saurabhsingh Rajput Dalhousie University, Tim Widmayer University College London (UCL), Ziyuan Shang Nanyang Technological University, Maria Kechagia National and Kapodistrian University of Athens, Federica Sarro University College London, Tushar Sharma Dalhousie University
14:45
15m
Talk
Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality
Journal-first Papers
Hao Li Queen's University, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Cor-Paul Bezemer University of Alberta
15:00
15m
Talk
Evaluating the Generalizability of LLMs in Automated Program Repair
New Ideas and Emerging Results (NIER)
Fengjie Li Tianjin University, Jiajun Jiang Tianjin University, Jiajun Sun Tianjin University, Hongyu Zhang Chongqing University
Pre-print
15:15
15m
Talk
How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study
New Ideas and Emerging Results (NIER)
Alejandro Velasco William & Mary, Daniel Rodriguez-Cardenas , David Nader Palacio William & Mary, Lutfar Rahman Alif University of Dhaka, Denys Poshyvanyk William & Mary
:
:
:
: