An Empirical Study of Predicting Fault-prone Components and their Evolution (APSEC 2022 - Technical Track)

Who

Aparna Pisolkar, Md Tajmilur Rahman

Track

APSEC 2022 Technical Track

Time Zone

The program is currently displayed in (GMT+09:00) Osaka, Sapporo, Tokyo.

Use conference time zone: (GMT+09:00) Osaka, Sapporo, TokyoSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 8 Dec 2022 15:40 - 16:00 at Room2 - Empirical Studies 2 Chair(s): Yusuf Sulistyo Nugroho

Abstract

Predicting fault-prone components at an early stage is useful for an organization to ensure quality software delivery. Prioritizing tests becomes easier with the prediction of fault-proneness of the components since developers can allocate more time and resources to the “High Efault-prone components. Furthermore, test prioritization helps reduce the cost of regression and allows developers to take careful decisions regarding the sensitive components. This paper performs an empirical study of predicting fault-prone components and their evolution on two popular open-source projects, Chromium web browser and ProFTPD. First, we construct and compare two prediction models: Random Forest (RF) and Support Vector Machine (SVM) for classifying components as “High Eor “Low Efault prone. Second, we analyze the evolution of the fault proneness of 22 components of Chromium in 42 releases and 12 components of ProFTPD in 15 releases. Chromium has a median of 3.9k commits per release with a standard deviation of 1k while ProFTPD has 578 commits per release with a standard deviation of 654. We consider the total churns, bug-fix commits, bug-fix churns, rush period changes, number of developers, and number of files modified as the measures to construct our prediction models. Our models are able to successfully predict the fault-proneness of the components. The Random Forest outperforms the SVM with an accuracy of 96% and 95% for Chromium and ProFTPD respectively. We found that the majority of the Chromium components are high fault-prone. For ProFTPD, components are found to be high and low fault-prone interchangeably. The fault proneness of the components has evolved over the releases where “Chromecast Ein Chromium seems to be gradually turning into a low fault-prone component.

Aparna Pisolkar

Gannon University

United States

Md Tajmilur Rahman