TCSE logo 
 Sigsoft logo
Sustainability badge

Sixth International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest 2025)

Machine Learning (ML) is widely adopted in modern software systems, including safety-critical domains such as autonomous cars, medical diagnosis, and aircraft collision avoidance systems. Thus, it is crucial to rigorously test such applications to ensure high dependability. However, standard notions of software quality and reliability become irrelevant when considering ML systems, due to their non-deterministic nature and the lack of a transparent understanding of the models’ semantics. ML is also expected to revolutionize software development. Indeed, ML is being applied for devising novel program analysis and software testing techniques related to malware detection, bug-finding, and type-checking

DeepTest 2025 aims to bring together academics and industry experts to discuss practical solutions and build momentum in this rapidly evolving field. The workshop will include invited talks and research presentations, providing a platform for participants to exchange ideas and insights.

This edition of DeepTest will be co-located with ICSE 2025, taking place from Sunday, April 27 to Saturday, May 3, 2025, in Ottawa, Ontario, Canada. The exact date of the workshop will be announced soon.

Previous Editions


The workshop is partially supported by the EU project Sec4AI4Sec.
Plenary
Hide plenary sessions

This program is tentative and subject to change.

You're viewing the program in a time zone which is different from your device's time zone change time zone

Sat 3 May

Displayed time zone: Eastern Time (US & Canada) change

07:00 - 17:00
09:00 - 10:30
Keynote SessionDeepTest at 213
09:00
90m
Keynote
Failures or False Alarms? Validating Tests and Failures for Cyber Physical Systems
DeepTest
Shiva Nejati University of Ottawa
10:30 - 11:00
10:30
30m
Break
Saturday Morning Break
Catering

11:00 - 12:30
Paper Presentation 1DeepTest at 213
11:00
30m
Talk
Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths
DeepTest
Naryeong Kim Korea Advanced Institute of Science and Technology, Sungmin Kang National University of Singapore, Gabin An Roku, Shin Yoo Korea Advanced Institute of Science and Technology
11:30
30m
Talk
DILLEMA: Diffusion and Large Language Models for Multi-Modal Augmentation
DeepTest
Luciano Baresi Politecnico di Milano, Davide Yi Xian Hu Politecnico di Milano, Muhammad Irfan Mas'Udi Politecnico di Milano, Giovanni Quattrocchi Politecnico di Milano
12:00
30m
Talk
On the Effectiveness of LLMs for Manual Test Verifications
DeepTest
Myron David Peixoto Federal University of Alagoas, Davy Baía Federal University of Alagoas, Nathalia Nascimento Pennsylvania State University, Paulo Alencar University of Waterloo, Baldoino Fonseca Federal University of Alagoas, Márcio Ribeiro Federal University of Alagoas, Brazil
12:30 - 14:00
13:15
45m
Lunch
Saturday Lunch
Catering

14:00 - 15:30
Paper Presentation 2DeepTest at 213
14:00
30m
Talk
DANDI: Diffusion as Normative Distribution for Deep Neural Network Input
DeepTest
Somin Kim Korea Advanced Institute of Science and Technology, Shin Yoo Korea Advanced Institute of Science and Technology
14:30
30m
Talk
Robust Testing for Deep Learning using Human Label Noise
DeepTest
Yi Yang Gordon Lim University of Michigan, Stefan Larson Vanderbilt University, Kevin Leach Vanderbilt University
15:00
30m
Talk
Improving the Reliability of Failure Prediction Models through Concept Drift Monitoring
DeepTest
Lorena Poenaru-Olaru TU Delft, Luís Cruz TU Delft, Jan S. Rellermeyer Leibniz University Hannover, Arie van Deursen TU Delft
15:30 - 16:00
15:30
30m
Break
Saturday Afternoon Break
Catering

16:00 - 17:30
Paper Presentation 3DeepTest at 213
16:00
30m
Talk
OpenCat: Improving Interoperability of ADS Testing
DeepTest
Qurban Ali University of Milano-Bicocca, Andrea Stocco Technical University of Munich, fortiss, Leonardo Mariani University of Milano-Bicocca, Oliviero Riganelli University of Milano - Bicocca
Pre-print
16:30
30m
Talk
Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation
DeepTest
Benjamin Steenhoek Microsoft, Michele Tufano Google, Neel Sundaresan Microsoft, Alexey Svyatkovskiy Google DeepMind

Call for Papers

DeepTest is an interdisciplinary workshop targeting research at the intersection of software engineering and deep learning. This workshop will explore issues related to:

  • Deep Learning applied to Software Engineering (DL4SE)
  • Software Engineering applied to Deep Learning (SE4DL)

Although the main focus is on Deep Learning, we also encourage submissions that are more broadly related to Machine Learning.

Topics of Interest

We welcome submissions introducing technology (i.e., frameworks, libraries, program analyses and tool evaluation) for testing DL-based applications, and DL-based solutions to solve open research problems (e.g., what is a bug in a DL/RL model). Relevant topics include, but are not limited to:

  • High-quality benchmarks for evaluating DL/RL approaches
  • Surveys and case studies using DL/RL technology
  • Techniques to aid interpretable DL/RL techniques
  • Techniques to improve the design of reliable DL/RL models
  • DL/RL-aided software development approaches
  • DL/RL for fault prediction, localization and repair
  • Fuzzing DL/RL systems
  • Metamorphic testing as software quality assurance
  • Fault Localization and Anomaly Detection
  • Use of DL for analyzing natural language-like artefacts such as code, or user reviews
  • DL/RL techniques to support automated software testing
  • DL/RL to aid program comprehension, program transformation, and program generation
  • Safety and security of DL/RL based systems
  • New approaches to estimate and measure uncertainty in DL/RL models

Types of Submissions

We accept two types of submissions:

  • Full research papers: up to 8-page papers (including references) describing original and unpublished results related to the workshop topics;
  • Short papers up: to 4-page papers (including references) describing preliminary work, new insights in previous work, or demonstrations of testing-related tools and prototypes.

All submissions must conform to the ICSE 2025 formatting instructions. All submissions must be in PDF. The page limit is strict. Submissions must conform to the IEEE conference proceedings template, specified in the IEEE Conference Proceedings Formatting Guidelines.

DeepTest 2025 will employ a double-blind review process. Thus, no submission may reveal its authors’ identities. The authors must make every effort to honor the double-blind review process. In particular, the authors’ names must be omitted from the submission, and references to their prior work should be in the third person.

If you have any questions or wonder whether your submission is in scope, please do not hesitate to contact the organizers.

Submission Site

https://easychair.org/my/conference?conf=deeptest2025

Prof. Shiva Nejati

Speaker: Prof. Shiva Nejati

Title: Failures or False Alarms? Validating Tests and Failures for Cyber Physical Systems

Abstract:

Software testing is about detecting failures, but not every failure necessarily indicates a genuine fault in the system under test. Some failures are spurious – caused by invalid test inputs or by flaky test outputs. While these spurious failures can arise in many contexts, they are especially prevalent in deep learning–enabled and cyber-physical systems, which often operate autonomously in complex, unpredictable environments. In such systems, virtually any environmental factor can act as an input, and the system’s internal decision-making – driven by deep learning models – may be nondeterministic or difficult to interpret. In this talk, I will discuss three solutions to ensure the validity of test inputs and increase the robustness of test outputs: (1) Generating interpretable constraints that characterize valid test inputs, (2) Developing effort-optimized, human-assisted methods for validating test inputs, and (3) Employing generative models to improve the consistency of test results across different simulation environments. I will illustrate these solutions through case studies from cyber-physical systems, autonomous driving, and network systems.

Bio:

Shiva Nejati is a Professor at the School of Electrical Engineering and Computer Science at the University of Ottawa, Canada. She co-founded and co-directs the university’s Sedna IoT Lab. Her current research is in software engineering, focusing on software testing, the analysis of IoT and cyber-physical systems, search-based software engineering, model-driven engineering, applied machine learning, and formal and empirical software engineering methods. Her research is closely connected to industry, having worked with 16 companies across Canada, Europe, Asia, and the US. These collaborations have resulted in over 90 published papers in top-tier venues, earning her eight Best Paper and ACM SIGSOFT Distinguished Paper Awards, as well as a 10-Year Most Influential Paper Award. She has served as a Program Co-Chair for four international conferences: SEAMS 2025, ICST 2024, MODELS 2021, and SSBSE 2019, and she will serve in the same role for ASE 2026. She is an elected at-large member of the IEEE TCSE Executive Committee and was an Associate Editor for IEEE TSE from 2019 to 2024.

:
: