A Unified Benchmark for Out-of-Distribution Detection for Autonomous Driving Systems
Autonomous Driving Systems (ADS) can fail when they encounter inputs that differ from their training data, known as out-of-distribution (OOD) conditions. Such OOD inputs lead ADS to make incorrect driving decisions, resulting in serious safety risks. Reliable OOD detection is therefore essential for enhancing system robustness and preventing hazardous behavior. However, existing literature in the autonomous driving field examines a relatively narrow scope of OOD detectors (e.g., reconstruction-based only) under limited OOD conditions.
To address these limitations, we present a unified benchmark that evaluates nine widely used OOD detectors on both simulated and real-world driving images. Specifically, we assess detectors from four families: reconstruction-based, density-based, distance-based, and Generative Adversarial Network (GAN)-based. To examine generalization ability, we propose an automated evaluation pipeline with statistical analysis that evaluates OOD detectors by automatically generating OOD samples (three weather transformations and four adversarial attacks) and assessing them based on both effectiveness (AUC-ROC, AUC-PR, F1 scores at five threshold values) and efficiency (inference time).
Our experiments reveal that distance-based detectors consistently deliver the highest overall performance and robustness across datasets and all OOD scenarios, followed closely by density-based detectors. In contrast, reconstruction-based and GAN-based methods exhibit greater performance fluctuations and reduced reliability under adversarial and complex perturbations. These findings suggest that OOD detection in autonomous driving benefits most from methods that leverage feature-space distance or probabilistic density estimation, as they capture intrinsic data distribution properties more effectively.
Overall, our work guides the selection of OOD detectors for real-world ADS deployment and emphasizes the importance of robustness, generalization, and inference efficiency. Our implementation is available at this link
Tue 14 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | Session 6: Testing Around the WorldAST 2026 at Oceania VI Chair(s): Hokeun Kim Arizona State University | ||
11:00 30mTalk | Understanding and Detecting Platform-Specific Violations in Android Auto Apps AST 2026 Pre-print Media Attached | ||
11:30 30mTalk | A Unified Benchmark for Out-of-Distribution Detection for Autonomous Driving Systems AST 2026 Xiangyu Li SeysoAI, Jingyu ZHANG Hong Kong Metropolitan University, Jacky Keung City University of Hong Kong, Xiaoxue Ma Hong Kong Metropolitan University, Yihan Liao City University of Hong Kong Pre-print Media Attached | ||
12:00 30mTalk | HYDRA: A Hybrid Heuristic-Guided Deep Representation Architecture for Predicting Latent Zero-Day Vulnerabilities in Patched Functions AST 2026 Mohammad Farhad University of Louisiana at Lafayette, Sabbir Rahman University of Louisiana at Lafayette, Shuvalaxmi Dass University of Louisiana at Lafayette Pre-print Media Attached | ||