EASE 2023
Tue 13 - Fri 16 June 2023 Oulu, Finland
Wed 14 Jun 2023 15:30 - 15:50 at Aurora Hall - Methodology and Secondary Studies Chair(s): Thomas Fehlmann

Binary classifiers are commonly used in software engineering research, to estimate several software qualities, e.g., defectiveness or vulnerability. Thus, it is important to adequately evaluate how well binary classifiers perform, before they are used in practice. The Area Under the Curve (AUC) of Receiver Operating Characteristic curves has often been used to this end. However, AUC has been widely criticized, so it is necessary to evaluate under what conditions and to what extent AUC can be a reliable performance metric.

We analyze AUC in relation to φ (also known as Matthews Correlation Coefficient), often considered a more reliable performance metric, by building the lines in the ROC space with constant value of φ, for several values of φ, and computing the corresponding values of AUC.

By their very definitions, AUC and φ depend on the prevalence ρ of a dataset, which is the proportion of its positive instances (e.g., the defective software modules). Hence, so does the relationship between AUC and φ. It turns out that AUC and φ are very well correlated, and therefore provide concordant indications, for balanced datasets (those with ρ around 0.5). Instead, AUC tends to become quite large, and hence provide over-optimistic indications, for very imbalanced datasets (those with ρ close to 0 or 1).

We use examples from the software engineering literature to illustrate the analytical relationship linking AUC, φ and ρ. We show that, for some values of ρ, the evaluation of performance based exclusively on AUC can be deceiving. In conclusion, this paper provides some guidelines for an informed usage and interpretation of AUC.

Presentation-4023 (4023-PresentationEASE2023.pptx)728KiB

Wed 14 Jun

Displayed time zone: Athens change

15:30 - 17:00
Methodology and Secondary StudiesEASIER / Research (Full Papers) / Short Papers and Posters / Journal First at Aurora Hall
Chair(s): Thomas Fehlmann Euro Project Office
15:30
20m
Paper
On the Reliability of the Area Under the ROC Curve in Empirical Software Engineering
Research (Full Papers)
Luigi Lavazza Università degli Studi dell'Insubria, Sandro Morasca Università degli Studi dell'Insubria, Gabriele Rotoloni
Pre-print File Attached
15:50
10m
Short-paper
Improving the Reporting of Threats to Construct ValidityShort Paper
Short Papers and Posters
Dag Sjøberg University of Oslo, Gunnar Rye Bergersen University of Oslo
DOI Pre-print File Attached
16:00
20m
Paper
A Systematic Literature Review on Client Selection in Federated Learning
Research (Full Papers)
Carl Smestad , Jingyue Li Norwegian University of Science and Technology
DOI Authorizer link Pre-print Media Attached File Attached
16:20
10m
Paper
A Means to what End? Evaluating the Explainability of Software Systems using Goal-Oriented Heuristics
EASIER
Hannah Deters Leibniz University Hannover, Jakob Droste Leibniz Universität Hannover, Kurt Schneider Leibniz Universität Hannover, Software Engineering Group
DOI File Attached
16:30
10m
Paper
Applications of natural language processing in software traceability: A systematic mapping study
Journal First
Zaki Pauzi University of Groningen, BP plc, Andrea Capiluppi University of Groningen
Link to publication DOI File Attached
16:40
10m
Paper
Burnout in software engineering: A systematic mapping study
Journal First
Tien Rahayu Tulili University of Groningen, Andrea Capiluppi University of Groningen, Ayushi Rastogi University of Groningen, The Netherlands
Link to publication DOI File Attached
16:50
10m
Paper
A rapid review of Responsible AI frameworks: How to guide the development of ethical AI
EASIER
Vita Santa Barletta University of Bari, Danilo Caivano University of Bari, Domenico Gigante SER&Practices and University of Bari
Pre-print File Attached