An empirical study of issue-link algorithms: which issue-link algorithms should we use? (ICSE 2023 - Journal-First Papers)

Who

Masanari Kondo, Yutaro Kashiwa, Yasutaka Kamei, Osamu Mizuno

Track

ICSE 2023 Journal-First Papers

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 May 2023 12:00 - 12:07 at Meeting Room 102 - Mining software repositories Chair(s): Brittany Johnson

Abstract

The accuracy of the SZZ algorithm is pivotal for just-in-time defect prediction because most prior studies have used the SZZ algorithm to detect defect-inducing commits to construct and evaluate their defect prediction models. The SZZ algorithm has two phases to detect defect-inducing commits: (1) linking issue reports in an issue-tracking system to possible defect-fixing commits in a version control system by using an issue-link algorithm (ILA); and (2) tracing the modifications of defect-fixing commits back to possible defect-inducing commits. Researchers and practitioners can address the second phase by using existing solutions such as a tool called cregit. In contrast, although various ILAs have been proposed for the first phase, no large-scale studies exist in which such ILAs are evaluated under the same experimental conditions. Hence, we still have no conclusions regarding the best-performing ILA for the first phase. In this paper, we compare 10 ILAs collected from our systematic literature study with regards to the accuracy of detecting defect-fixing commits. In addition, we compare the defect prediction performance of ILAs and their combinations that can detect defect-fixing commits accurately. We conducted experiments on five open-source software projects. We found that all ILAs and their combinations prevented the defect prediction model from being affected by missing defect-fixing commits. In particular, the combination of a natural language text similarity approach, Phantom heuristics, a random forest approach, and a support vector machine approach is the best way to statistically significantly reduced the absolute differences from the ground-truth defect prediction performance. We summarized the guidelines to use ILAs as our recommendations.

Masanari Kondo

Kyushu University

Japan

Yutaro Kashiwa

Nara Institute of Science and Technology

Japan

Yasutaka Kamei

Kyushu University

Japan

Osamu Mizuno

Kyoto Institute of Technology

Japan

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 May
Displayed time zone: Hobart change

11:00 - 12:30	Mining software repositoriesTechnical Track / Journal-First Papers / DEMO - Demonstrations at Meeting Room 102 Chair(s): Brittany Johnson George Mason University

11:00 15m Talk		The untold story of code refactoring customizations in practice Technical Track Daniel Oliveira PUC-Rio, Wesley Assunção Johannes Kepler University Linz, Austria & Pontifical Catholic University of Rio de Janeiro, Brazil, Alessandro Garcia PUC-Rio, Ana Carla Bibiano PUC-Rio, Márcio Ribeiro Federal University of Alagoas, Brazil, Rohit Gheyi Federal University of Campina Grande, Baldoino Fonseca Federal University of Alagoas (UFAL) Pre-print
11:15 15m Talk		Data Quality for Software Vulnerability Datasets Technical Track Roland Croft The University of Adelaide, Muhammad Ali Babar University of Adelaide, M. Mehdi Kholoosi University of Adelaide Pre-print
11:30 15m Talk		Do code refactorings influence the merge effort? Technical Track André Oliveira Federal Fluminense University, Vania Neves Universidade Federal Fluminense (UFF), Alexandre Plastino Federal Fluminense University, Ana Carla Bibiano PUC-Rio, Alessandro Garcia PUC-Rio, Leonardo Murta Universidade Federal Fluminense (UFF)
11:45 7m Talk		ActionsRemaker: Reproducing GitHub Actions DEMO - Demonstrations Hao-Nan Zhu University of California, Davis, Kevin Guan University of California, Davis, Robert M. Furth University of California, Davis, Cindy Rubio-González University of California at Davis
11:52 7m Talk		Problems with with SZZ and Features: An empirical assessment of the state of practice of defect prediction data collection Journal-First Papers Steffen Herbold University of Passau, Alexander Trautsch University of Passau, Alexander Trautsch Germany, Benjamin Ledel None
12:00 7m Talk		An empirical study of issue-link algorithms: which issue-link algorithms should we use? Journal-First Papers Masanari Kondo Kyushu University, Yutaro Kashiwa Nara Institute of Science and Technology, Yasutaka Kamei Kyushu University, Osamu Mizuno Kyoto Institute of Technology
12:07 7m Talk		SCS-Gan: Learning Functionality-Agnostic Stylometric Representations for Source Code Authorship Verification Journal-First Papers Weihan Ou Queen's University at Kingston, Ding Steven, H., H. Queen’s University at Kingston, Yuan Tian Queens University, Kingston, Canada, Leo Song Queen’s University at Kingston
12:15 15m Talk		A Comprehensive Study of Real-World Bugs in Machine Learning Model Optimization Technical Track Hao Guan The University of Queensland, Ying Xiao Southern University of Science and Technology, Jiaying Li Microsoft, Yepang Liu Southern University of Science and Technology, Guangdong Bai University of Queensland