Where Tests Fall Short: Empirically Analyzing Oracle Gaps in Covered Code (ESEIW 2025 - ESEM - Technical Track)

Sun 28 September - Fri 3 October 2025

Who

Megan Maton, Gregory Kapfhammer, Phil McMinn

Track

ESEIW 2025 ESEM - Technical Track

Abstract

Background: Code coverage alone can miss critical blind spots where the test suite does not adequately check executed code outputs, enabling faults to persist undetected. Current techniques, such as checked coverage and pseudo-tested statement identification, address this by identifying oracle gaps - program statements that are executed by tests but do not influence a test assertion outcome - thus guiding developers to priority testing areas. These oracle gaps reveal fault detection weakness in a program and are essential to address. Aims: This knowledge-seeking study aims to compare oracle gaps revealed by Dynamic Slicing, Observational Slicing, and pseudo-tested statement-based techniques to understand their practical use cases. Method: Using a mixed-method empirical analysis, we conduct an in-depth evaluation of oracle gaps calculated by each technique in 30 Java classes across six open-source projects. We quantitatively assess the gaps for their prominence, statement distribution, correlation with fault detection, and execution times. To identify code patterns in the oracle gaps, we perform a manual inspection with a negotiated agreement. Results: The oracle gaps ranged from 5% to 39% of covered statements, with low Jaccard Similarity scores (0.06 to 0.21) indicating a high variance in gap content. Observational slicing and pseudo-tested statements highlighted string manipulation and data loading as being in the oracle gap, whilst dynamic slicing highlights more checks and iterator statements. Pseudo-tested statements had the lowest mutation scores and lowest execution times, identifying areas of weaker fault detection efficiently. Observational slicing often took multiple hours to execute on a single class. Conclusions: Identifying pseudo-tested statements is the most targeted technique, pinpointing fault detection weaknesses with the smallest gap. Observational slicing’s considerable execution times make it impractical for use beyond research.

Megan Maton

Where Tests Fall Short: Empirically Analyzing Oracle Gaps in Covered Code

Megan Maton

University of Sheffield

United Kingdom

Gregory Kapfhammer

Allegheny College

United States

Phil McMinn

University of Sheffield

United Kingdom

Tracks