Uncovering the Challenges: A Study of Corner Cases in Bug-Inducing Commits
In software development, accurately identifying bug-inducing commits (BICs) is crucial for maintaining code integrity and ensuring the reliability of software systems. The complexities involved in pinpointing the exact commits responsible for bugs necessitate a thorough investigation of the underlying issues and limitations of existing tools and algorithms. This study investigates and identifies corner cases in BIC identification, clarifying definitions and examining issues with existing algorithms and tools. By analyzing these cases, we aim to reveal challenges faced by current methods and propose insights for future improvements. We evaluated the SZZ algorithm and two large language models, GPT-4o and Llama 3.1, using a curated repository of corner-case bugs with detailed reports. This setup allowed us to assess the strengths and weaknesses of both traditional algorithms and LLMs. The SZZ algorithm achieved a recall of 0.8 and a precision of 0.36, resulting in an F1 score of 0.5 for corner cases and a recall of 1 and a precision of 0.5 for non-corner cases with an F1 score of 0.67. In comparison, the LLMs showed varied performance: for corner cases, Llama had an MRR of 0.7, while GPT scored 0.5. For non-corner cases, both models performed better, with an MRR of 0.875. Corner cases in BIC identification expose limitations in current methods, emphasizing the need for improved approaches to accurately handle these challenges.
Fri 7 MarDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:30 | Mining Software RepositoriesResearch Papers / Early Research Achievement (ERA) Track / Journal First Track / Reproducibility Studies and Negative Results (RENE) Track at L-1720 Chair(s): Brittany Reid Nara Institute of Science and Technology | ||
11:00 15mTalk | An Empirical Study of Transformer Models on Automatically Templating GitHub Issue Reports Research Papers Jin Zhang Hunan Normal University, Maoqi Peng Hunan Normal University, Yang Zhang National University of Defense Technology, China | ||
11:15 15mTalk | How to Select Pre-Trained Code Models for Reuse? A Learning Perspective Research Papers Zhangqian Bi Huazhong University of Science and Technology, Yao Wan Huazhong University of Science and Technology, Zhaoyang Chu Huazhong University of Science and Technology, Yufei Hu Huazhong University of Science and Technology, Junyi Zhang Huazhong University of Science and Technology, Hongyu Zhang Chongqing University, Guandong Xu University of Technology, Hai Jin Huazhong University of Science and Technology Pre-print | ||
11:30 7mTalk | Uncovering the Challenges: A Study of Corner Cases in Bug-Inducing Commits Early Research Achievement (ERA) Track | ||
11:37 15mTalk | A Bot Identification Model and Tool Based on GitHub Activity Sequences Journal First Track Natarajan Chidambaram University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS, Tom Mens University of Mons | ||
11:52 15mTalk | Does the Tool Matter? Exploring Some Causes of Threats to Validity in Mining Software Repositories Reproducibility Studies and Negative Results (RENE) Track Nicole Hoess Technical University of Applied Sciences Regensburg, Carlos Paradis No Affiliation, Rick Kazman University of Hawai‘i at Mānoa, Wolfgang Mauerer Technical University of Applied Sciences Regensburg |