Empirically Evaluating Flaky Tests for Autonomous Driving Systems in Simulated Environments (FTW 2025 - 2nd International Flaky Tests Workshop 2025 (FTW 2025))

Who

Olek Osikowicz, Phil McMinn, Donghwan Shin

Track

FTW 2025 Flaky Tests Workshop

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 27 Apr 2025 11:45 - 12:07 at 206 - Flakiness in Specific/Previously Neglected Domains

Abstract

In Autonomous Driving Systems (ADS) testing, a test scenario is a pre-defined, specific sequence of events, including static entities (e.g., road shapes and traffic signs) and dynamic entities (e.g., traffic lights and the trajectories of surrounding vehicles). By creating an environment according to a test scenario and running the ADS under test in that environment, we can verify whether the ADS causes any safety violations (e.g., collisions with other vehicles) or not. Due to the high cost and risks associated with setting up test scenarios in the real world, simulation-based testing, which relies on driving simulators that can create various virtual driving environments, has gained significant attention. Since simulated environments can be more deterministic than the real world, simulation-based testing can provide non-flaky tests, i.e., the same test outcome for the same test scenario (and the same ADS), in theory. However, do we really have no flaky tests in simulation-based ADS testing?

This paper empirically investigates flaky tests in simulation-based ADS testing using two widely used, open-source driving simulators: CARLA and MetaDrive. Our results show that, surprisingly, 31.3% of benchmark test scenarios are potentially flaky due to nondeterministic simulations in CARLA, whereas MetaDrive does not yield any flaky tests. We further discuss potential causes of nondeterministic simulations, implications of flaky tests in ADS testing, and practical strategies for mitigating flaky tests in ADS testing.

Link to Preprint

https://philmcminn.com/publications/osikowicz2025.pdf

Olek Osikowicz

University of Sheffield, UK

United Kingdom

Phil McMinn

University of Sheffield

United Kingdom

Donghwan Shin

The University of Sheffield

United Kingdom

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 27 Apr
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	Flakiness in Specific/Previously Neglected DomainsFTW at 206

11:00 22m Paper		A Preliminary Study of Fixed Flaky Tests in Rust Projects on GitHub FTW Tom Schroeder University of Illinois Urbana-Champaign, Minh Phan University of Illinois Urbana-Champaign, Yang Chen University of Illinois at Urbana-Champaign
11:22 22m Talk		Beyond Test Flakiness: A Manifesto for a Holistic Approach to Test Suite Health FTW Phil McMinn University of Sheffield, Muhammad Firhard Roslan University of Bristol, Gregory Kapfhammer Allegheny College
11:45 22m Paper		Empirically Evaluating Flaky Tests for Autonomous Driving Systems in Simulated Environments FTW Olek Osikowicz University of Sheffield, UK, Phil McMinn University of Sheffield, Donghwan Shin The University of Sheffield Pre-print
12:07 22m Panel		Mini Panel 1 FTW