An Empirical Examination of the Impact of Bias on Just-in-time Defect Prediction (ESEM 2021 - Technical Papers)

Who

Jiri Gesi, Jiawei Li, Iftekhar Ahmed

Track

ESEM 2021 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 14 Oct 2021 15:45 - 16:00 at ESEM ROOM - Defect Prediction Chair(s): Valentina Lenarduzzi

Abstract

Background: Just-In-Time (JIT) defect prediction models predict if a commit will introduce defects in the future. DeepJIT and CC2Vec are two state-of-the-art JIT Deep Learning (DL) techniques. Usually, defect prediction techniques are evaluated, treating all training data equally. However, data is usually imbalanced not only in terms of the overall class label (e.g., defect and non-defect) but also in terms of characteristics such as File Count, Edit Count, Multiline Comments, Inward Dependency Sum etc. Prior research has investigated the impact of class imbalance on prediction technique’s performance but not the impact of imbalance of other characteristics.

Aims: We aim to explore the impact of different commit related characteristic’s imbalance on DL defect prediction.

Method: We investigated different characteristic’s impact on the overall performance of DeepJIT and CC2Vec. We also propose a Siamese network based few-shot learning framework for JIT defect prediction (SifterJIT) combining Siamese network and DeepJIT.

Results: Our results show that DeepJIT and CC2Vec lose out on the performance by around 20% when trained and tested on imbalanced data. However, SifterJIT can outperform state-of-the-art DL techniques with an average of 8.65% AUC score, 11% precision, and 6% F1-score improvement.

Conclusions: Our results highlight that dataset imbalanced in terms of commit characteristics can significantly impact prediction performance, and few-shot learning based techniques can help alleviate the situation.

Jiri Gesi

University of California, Irvine

United States

Jiawei Li

University of california, Irvine

United States

Iftekhar Ahmed

University of California, Irvine

United States

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 14 Oct
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

15:30 - 16:00	Defect PredictionTechnical Papers at ESEM ROOM Chair(s): Valentina Lenarduzzi LUT University

15:30 15m Talk		Continuous Software Bug Prediction Technical Papers Song Wang York University, Junjie Wang Institute of Software at Chinese Academy of Sciences, Jaechang Nam Handong Global University, Nachiappan Nagappan Facebook Pre-print
15:45 15m Talk		An Empirical Examination of the Impact of Bias on Just-in-time Defect Prediction Technical Papers Jiri Gesi University of California, Irvine, Jiawei Li University of california, Irvine, Iftekhar Ahmed University of California, Irvine

Information for Participants

Thu 14 Oct 2021 15:30 - 16:00 at ESEM ROOM - Defect Prediction Chair(s): Valentina Lenarduzzi

Info for room ESEM ROOM:

https://www.youtube.com/c/ESEM_Conference