Uncovering the Challenges: A Study of Corner Cases in Bug-Inducing Commits (SANER 2025 - Early Research Achievement (ERA) Track )

Who

Atakan Şerifoğlu, Eray Tüzün

Track

SANER 2025 Early Research Achievement (ERA) Track

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 7 Mar 2025 11:30 - 11:37 at L-1720 - Mining Software Repositories Chair(s): Brittany Reid

Abstract

In software development, accurately identifying bug-inducing commits (BICs) is crucial for maintaining code integrity and ensuring the reliability of software systems. The complexities involved in pinpointing the exact commits responsible for bugs necessitate a thorough investigation of the underlying issues and limitations of existing tools and algorithms. This study investigates and identifies corner cases in BIC identification, clarifying definitions and examining issues with existing algorithms and tools. By analyzing these cases, we aim to reveal challenges faced by current methods and propose insights for future improvements. We evaluated the SZZ algorithm and two large language models, GPT-4o and Llama 3.1, using a curated repository of corner-case bugs with detailed reports. This setup allowed us to assess the strengths and weaknesses of both traditional algorithms and LLMs. The SZZ algorithm achieved a recall of 0.8 and a precision of 0.36, resulting in an F1 score of 0.5 for corner cases and a recall of 1 and a precision of 0.5 for non-corner cases with an F1 score of 0.67. In comparison, the LLMs showed varied performance: for corner cases, Llama had an MRR of 0.7, while GPT scored 0.5. For non-corner cases, both models performed better, with an MRR of 0.875. Corner cases in BIC identification expose limitations in current methods, emphasizing the need for improved approaches to accurately handle these challenges.

Atakan Şerifoğlu

Bilkent University

Turkey

Eray Tüzün