Evaluating the Impact of Flaky Simulators on Testing Autonomous Driving Systems (ICSE 2025 - Journal-first Papers)

Who

Mohammad Hossein Amini, Shervin Naseri, Shiva Nejati

Track

ICSE 2025 Journal-first Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 2 May 2025 14:45 - 15:00 at 215 - SE for AI with Quality 2 Chair(s): Romina Spalazzese

Abstract

Simulators are widely used to test Autonomous Driving Systems (ADS), but their potential flakiness can lead to inconsistent test results. We investigate test flakiness in simulation-based testing of ADS by addressing two key questions: (1) How do flaky ADS simulations impact automated testing that relies on randomized algorithms? and (2) Can machine learning (ML) effectively identify flaky ADS tests while decreasing the required number of test reruns? Our empirical results, obtained from two widely-used open-source ADS simulators and five diverse ADS test setups, show that test flakiness in ADS is a common occurrence and can significantly impact the test results obtained by randomized algorithms. Further, our ML classifiers effectively identify flaky ADS tests using only a single test run, achieving F1-scores of 85%, 82% and 96% for three different ADS test setups. Our classifiers significantly outperform our non-ML baseline, which requires executing tests at least twice, by 31%, 21%, and 13% in F1-score performance, respectively. We conclude with a discussion on the scope, implications and limitations of our study. We provide our complete replication package in a Github repository (Github paper 2023).

Mohammad Hossein Amini

University of Ottawa

Shervin Naseri

University of Ottawa

Canada

Shiva Nejati

University of Ottawa

Canada

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 2 May
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	SE for AI with Quality 2Journal-first Papers / Research Track at 215 Chair(s): Romina Spalazzese Malmö University

14:00 15m Talk		Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning ProjectsSE for AI Journal-first Papers Han Wang Monash University, Sijia Yu Jilin University, Chunyang Chen TU Munich, Burak Turhan University of Oulu, Xiaodong Zhu Jilin University Link to publication DOI Pre-print
14:15 15m Talk		Boundary State Generation for Testing and Improvement of Autonomous Driving SystemsSE for AI Journal-first Papers Matteo Biagiola Università della Svizzera italiana, Paolo Tonella USI Lugano DOI Pre-print
14:30 15m Talk		D3: Differential Testing of Distributed Deep Learning with Model GenerationSE for AI Journal-first Papers Jiannan Wang Purdue University, Hung Viet Pham York University, Qi Li , Lin Tan Purdue University, Yu Guo Meta Inc., Adnan Aziz Meta Inc., Erik Meijer
14:45 15m Talk		Evaluating the Impact of Flaky Simulators on Testing Autonomous Driving SystemsSE for AI Journal-first Papers Mohammad Hossein Amini University of Ottawa, Shervin Naseri University of Ottawa, Shiva Nejati University of Ottawa
15:00 15m Talk		Reinforcement Learning for Online Testing of Autonomous Driving Systems: a Replication and Extension StudySE for AI Journal-first Papers Luca Giamattei Università di Napoli Federico II, Matteo Biagiola Università della Svizzera italiana, Roberto Pietrantuono Università di Napoli Federico II, Stefano Russo Università di Napoli Federico II, Paolo Tonella USI Lugano DOI Pre-print
15:15 15m Talk		Two is Better Than One: Digital Siblings to Improve Autonomous Driving TestingSE for AI Journal-first Papers Matteo Biagiola Università della Svizzera italiana, Andrea Stocco Technical University of Munich, fortiss, Vincenzo Riccio University of Udine, Paolo Tonella USI Lugano DOI Pre-print