Predicting Defective Lines Using a Model-Agnostic Technique (ICSE 2021 - Journal-First Papers)

Who

Supatsara Wattanakriengkrai, Patanamon Thongtanunam, Kla Tantithamthavorn, Hideaki Hata, Kenichi Matsumoto

Track

ICSE 2021 Journal-First Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 27 May 2021 10:40 - 11:00 at Blended Sessions Room 3 - 3.1.3. Defect Prediction: Automation #2 Chair(s): Robert Feldt
Thu 27 May 2021 22:40 - 23:00 at Blended Sessions Room 3 - 3.1.3. Defect Prediction: Automation #2

Abstract

Defect prediction models are proposed to help a team prioritize source code areas files that need Software Quality Assurance (SQA) based on the likelihood of having defects. However, developers may waste their unnecessary effort on the whole file while only a small fraction of its source code lines are defective. Indeed, we find that as little as 1%-3% of lines of a file are defective. Hence, in this work, we propose a novel framework (called LINE-DP) to identify defective lines using a model-agnostic technique, i.e., an Explainable AI technique that provides information why the model makes such a prediction. Broadly speaking, our LINE-DP first builds a file-level defect model using code token features. Then, our LINE-DP uses a state-of-the-art model-agnostic technique (i.e., LIME) to identify risky tokens, i.e., code tokens that lead the file-level defect model to predict that the file will be defective. Then, the lines that contain risky tokens are predicted as defective lines. Through a case study of 32 releases of nine Java open source systems, our evaluation results show that our LINE-DP achieves an average recall of 0.61, a false alarm rate of 0.47, a top 20%LOC recall of 0.27, and an initial false alarm of 16, which are statistically better than six baseline approaches. Our evaluation shows that our LINE-DP requires an average computation time of 10 seconds including model construction and defective identification time. In addition, we find that 63% of defective lines that can be identified by our LINE-DP are related to common defects (e.g., argument change, condition change). These results suggest that our LINE-DP can effectively identify defective lines that contain common defects while requiring a smaller amount of inspection effort and a manageable computation cost. The contribution of this paper builds an important step towards line-level defect prediction by leveraging a model-agnostic technique.

Link to Preprint

https://arxiv.org/abs/2009.03612

DOI

https://doi.org/10.1109/TSE.2020.3023177

Supatsara Wattanakriengkrai

Nara Institute of Science and Technology

Patanamon Thongtanunam

University of Melbourne

Australia

Kla Tantithamthavorn

Monash University

Australia

Hideaki Hata

Shinshu University

Japan

Kenichi Matsumoto

Nara Institute of Science and Technology

Predicting Defective Lines Using a Model-Agnostic Technique

YT video