Empirically Evaluating Flaky Tests for Autonomous Driving Systems in Simulated Environments
This program is tentative and subject to change.
In Autonomous Driving Systems (ADS) testing, a test scenario is a pre-defined, specific sequence of events, including static entities (e.g., road shapes and traffic signs) and dynamic entities (e.g., traffic lights and the trajectories of surrounding vehicles). By creating an environment according to a test scenario and running the ADS under test in that environment, we can verify whether the ADS causes any safety violations (e.g., collisions with other vehicles) or not. Due to the high cost and risks associated with setting up test scenarios in the real world, simulation-based testing, which relies on driving simulators that can create various virtual driving environments, has gained significant attention. Since simulated environments can be more deterministic than the real world, simulation-based testing can provide non-flaky tests, i.e., the same test outcome for the same test scenario (and the same ADS), in theory. However, do we really have no flaky tests in simulation-based ADS testing?
This paper empirically investigates flaky tests in simulation-based ADS testing using two widely used, open-source driving simulators: CARLA and MetaDrive. Our results show that, surprisingly, 31.3% of benchmark test scenarios are potentially flaky due to nondeterministic simulations in CARLA, whereas MetaDrive does not yield any flaky tests. We further discuss potential causes of nondeterministic simulations, implications of flaky tests in ADS testing, and practical strategies for mitigating flaky tests in ADS testing.
This program is tentative and subject to change.
Sun 27 AprDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:30 | |||
11:00 22mPaper | A Preliminary Study of Fixed Flaky Tests in Rust Projects on GitHub FTW Tom Schroeder University of Illinois Urbana-Champaign, Minh Phan University of Illinois Urbana-Champaign, Yang Chen University of Illinois at Urbana-Champaign | ||
11:22 22mTalk | Beyond Test Flakiness: A Manifesto for a Holistic Approach to Test Suite Health FTW Phil McMinn University of Sheffield, Muhammad Firhard Roslan University of Sheffield, Gregory Kapfhammer Allegheny College | ||
11:45 22mPaper | Empirically Evaluating Flaky Tests for Autonomous Driving Systems in Simulated Environments FTW Olek Osikowicz University of Sheffield, UK, Phil McMinn University of Sheffield, Donghwan Shin University of Sheffield | ||
12:07 22mPanel | Mini Panel 1 FTW |