An Experience in the Evaluation of Fault Prediction (PROFES 2023 - Research Papers)

Who

Luigi Lavazza, Sandro Morasca, Gabriele Rotoloni

Track

PROFES 2023 Research Papers

Time Zone

The program is currently displayed in (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 13 Dec 2023 11:00 - 11:10 at W211 - Software Testing and Quality Assurance Chair(s): Dietmar Pfahl

Abstract

Background: ROC (Receiver Operating Characteristic) curves are widely used to represent the performance (i.e., degree of correctness) of fault proneness models. AUC, the Area Under the ROC Curve is a quite popular performance metric, which summarizes into a single number the goodness of the predictions represented by the ROC curve. Alternative techniques have been proposed for evaluating the performance represented by a ROC curve: among these are RRA (Ratio of Relevant Areas) and φ (alias Matthews Correlation Coefficient).

Objectives: In this paper, we aim at evaluating AUC as a performance metric, also with respect to alternative proposals.

Method: We carry out an empirical study by replicating a previously published fault prediction study and measuring the performance of the obtained faultiness models using AUC, RRA, and a recently proposed way of relating a specific kind of ROC curves to φ, based on iso-φ ROC curves, i.e., ROC curves with constant φ. We take into account prevalence, i.e., the proportion of faulty modules in the dataset that is the object of predictions.

Results: AUC appears to provide indications that are concordant with φ for fairly balanced datasets, while it is much more optimistic than φ for quite imbalanced datasets. RRA’s indications appear to be moderately affected by the degree of balance in a dataset. In addition, RRA appears to agree with φ.

Conclusions: Based on the collected evidence, AUC does not seem to be suitable for evaluating the performance of fault proneness models when used with imbalanced datasets. In these cases, using RRA can be a better choice.

Luigi Lavazza

Università degli Studi dell'Insubria

Italy

Sandro Morasca