Sixth International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest 2025)
Machine Learning (ML) is widely adopted in modern software systems, including safety-critical domains such as autonomous cars, medical diagnosis, and aircraft collision avoidance systems. Thus, it is crucial to rigorously test such applications to ensure high dependability. However, standard notions of software quality and reliability become irrelevant when considering ML systems, due to their non-deterministic nature and the lack of a transparent understanding of the models’ semantics. ML is also expected to revolutionize software development. Indeed, ML is being applied for devising novel program analysis and software testing techniques related to malware detection, bug-finding, and type-checking
DeepTest 2025 aims to bring together academics and industry experts to discuss practical solutions and build momentum in this rapidly evolving field. The workshop will include invited talks and research presentations, providing a platform for participants to exchange ideas and insights.
This edition of DeepTest will be co-located with ICSE 2025, taking place from Sunday, April 27 to Saturday, May 3, 2025, in Ottawa, Ontario, Canada. The exact date of the workshop will be announced soon.
Previous Editions
- DeepTest 2024 was co-located with ICSE 2024
- DeepTest 2023 was co-located with ICSE 2023
- DeepTest 2021 was co-located with ICSE 2021
- DeepTest 2020 was co-located with ICSE 2020
- DeepTest 2019 was co-located with ICSE 2019
The workshop is partially supported by the EU project Sec4AI4Sec.
This program is tentative and subject to change.
Sat 3 MayDisplayed time zone: Eastern Time (US & Canada) change
07:00 - 17:00 | |||
09:00 - 10:30 | |||
09:00 90mKeynote | Failures or False Alarms? Validating Tests and Failures for Cyber Physical Systems DeepTest Shiva Nejati University of Ottawa |
10:30 - 11:00 | |||
10:30 30mBreak | Saturday Morning Break Catering |
11:00 - 12:30 | |||
11:00 30mTalk | Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths DeepTest Naryeong Kim Korea Advanced Institute of Science and Technology, Sungmin Kang National University of Singapore, Gabin An Roku, Shin Yoo Korea Advanced Institute of Science and Technology | ||
11:30 30mTalk | DILLEMA: Diffusion and Large Language Models for Multi-Modal Augmentation DeepTest Luciano Baresi Politecnico di Milano, Davide Yi Xian Hu Politecnico di Milano, Muhammad Irfan Mas'Udi Politecnico di Milano, Giovanni Quattrocchi Politecnico di Milano | ||
12:00 30mTalk | On the Effectiveness of LLMs for Manual Test Verifications DeepTest Myron David Peixoto Federal University of Alagoas, Davy Baía Federal University of Alagoas, Nathalia Nascimento Pennsylvania State University, Paulo Alencar University of Waterloo, Baldoino Fonseca Federal University of Alagoas, Márcio Ribeiro Federal University of Alagoas, Brazil |
12:30 - 14:00 | |||
13:15 45mLunch | Saturday Lunch Catering |
14:00 - 15:30 | |||
14:00 30mTalk | DANDI: Diffusion as Normative Distribution for Deep Neural Network Input DeepTest | ||
14:30 30mTalk | Robust Testing for Deep Learning using Human Label Noise DeepTest Yi Yang Gordon Lim University of Michigan, Stefan Larson Vanderbilt University, Kevin Leach Vanderbilt University | ||
15:00 30mTalk | Improving the Reliability of Failure Prediction Models through Concept Drift Monitoring DeepTest Lorena Poenaru-Olaru TU Delft, Luís Cruz TU Delft, Jan S. Rellermeyer Leibniz University Hannover, Arie van Deursen TU Delft |
15:30 - 16:00 | |||
15:30 30mBreak | Saturday Afternoon Break Catering |
16:00 - 17:30 | |||
16:00 30mTalk | OpenCat: Improving Interoperability of ADS Testing DeepTest Qurban Ali University of Milano-Bicocca, Andrea Stocco Technical University of Munich, fortiss, Leonardo Mariani University of Milano-Bicocca, Oliviero Riganelli University of Milano - Bicocca Pre-print | ||
16:30 30mTalk | Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation DeepTest Benjamin Steenhoek Microsoft, Michele Tufano Google, Neel Sundaresan Microsoft, Alexey Svyatkovskiy Google DeepMind |
Accepted Papers
Call for Papers
DeepTest is an interdisciplinary workshop targeting research at the intersection of software engineering and deep learning. This workshop will explore issues related to:
- Deep Learning applied to Software Engineering (DL4SE)
- Software Engineering applied to Deep Learning (SE4DL)
Although the main focus is on Deep Learning, we also encourage submissions that are more broadly related to Machine Learning.
Topics of Interest
We welcome submissions introducing technology (i.e., frameworks, libraries, program analyses and tool evaluation) for testing DL-based applications, and DL-based solutions to solve open research problems (e.g., what is a bug in a DL/RL model). Relevant topics include, but are not limited to:
- High-quality benchmarks for evaluating DL/RL approaches
- Surveys and case studies using DL/RL technology
- Techniques to aid interpretable DL/RL techniques
- Techniques to improve the design of reliable DL/RL models
- DL/RL-aided software development approaches
- DL/RL for fault prediction, localization and repair
- Fuzzing DL/RL systems
- Metamorphic testing as software quality assurance
- Fault Localization and Anomaly Detection
- Use of DL for analyzing natural language-like artefacts such as code, or user reviews
- DL/RL techniques to support automated software testing
- DL/RL to aid program comprehension, program transformation, and program generation
- Safety and security of DL/RL based systems
- New approaches to estimate and measure uncertainty in DL/RL models
Types of Submissions
We accept two types of submissions:
- Full research papers: up to 8-page papers (including references) describing original and unpublished results related to the workshop topics;
- Short papers up: to 4-page papers (including references) describing preliminary work, new insights in previous work, or demonstrations of testing-related tools and prototypes.
All submissions must conform to the ICSE 2025 formatting instructions. All submissions must be in PDF. The page limit is strict. Submissions must conform to the IEEE conference proceedings template, specified in the IEEE Conference Proceedings Formatting Guidelines.
DeepTest 2025 will employ a double-blind review process. Thus, no submission may reveal its authors’ identities. The authors must make every effort to honor the double-blind review process. In particular, the authors’ names must be omitted from the submission, and references to their prior work should be in the third person.
If you have any questions or wonder whether your submission is in scope, please do not hesitate to contact the organizers.
Submission Site
Keynote
Speaker: Prof. Shiva Nejati
Title: Failures or False Alarms? Validating Tests and Failures for Cyber Physical Systems
Abstract:
Software testing is about detecting failures, but not every failure necessarily indicates a genuine fault in the system under test. Some failures are spurious – caused by invalid test inputs or by flaky test outputs. While these spurious failures can arise in many contexts, they are especially prevalent in deep learning–enabled and cyber-physical systems, which often operate autonomously in complex, unpredictable environments. In such systems, virtually any environmental factor can act as an input, and the system’s internal decision-making – driven by deep learning models – may be nondeterministic or difficult to interpret. In this talk, I will discuss three solutions to ensure the validity of test inputs and increase the robustness of test outputs: (1) Generating interpretable constraints that characterize valid test inputs, (2) Developing effort-optimized, human-assisted methods for validating test inputs, and (3) Employing generative models to improve the consistency of test results across different simulation environments. I will illustrate these solutions through case studies from cyber-physical systems, autonomous driving, and network systems.
Bio:
Shiva Nejati is a Professor at the School of Electrical Engineering and Computer Science at the University of Ottawa, Canada. She co-founded and co-directs the university’s Sedna IoT Lab. Her current research is in software engineering, focusing on software testing, the analysis of IoT and cyber-physical systems, search-based software engineering, model-driven engineering, applied machine learning, and formal and empirical software engineering methods. Her research is closely connected to industry, having worked with 16 companies across Canada, Europe, Asia, and the US. These collaborations have resulted in over 90 published papers in top-tier venues, earning her eight Best Paper and ACM SIGSOFT Distinguished Paper Awards, as well as a 10-Year Most Influential Paper Award. She has served as a Program Co-Chair for four international conferences: SEAMS 2025, ICST 2024, MODELS 2021, and SSBSE 2019, and she will serve in the same role for ASE 2026. She is an elected at-large member of the IEEE TCSE Executive Committee and was an Associate Editor for IEEE TSE from 2019 to 2024.