Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada

Sixth International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest 2025)

Machine Learning (ML) is widely adopted in modern software systems, including safety-critical domains such as autonomous cars, medical diagnosis, and aircraft collision avoidance systems. Thus, it is crucial to rigorously test such applications to ensure high dependability. However, standard notions of software quality and reliability become irrelevant when considering ML systems, due to their non-deterministic nature and the lack of a transparent understanding of the models’ semantics. ML is also expected to revolutionize software development. Indeed, ML is being applied for devising novel program analysis and software testing techniques related to malware detection, bug-finding, and type-checking

DeepTest 2025 aims to bring together academics and industry experts to discuss practical solutions and build momentum in this rapidly evolving field. The workshop will include invited talks and research presentations, providing a platform for participants to exchange ideas and insights.

This edition of DeepTest will be co-located with ICSE 2025, taking place from Sunday, April 27 to Saturday, May 3, 2025, in Ottawa, Ontario, Canada. The exact date of the workshop will be announced soon.

Previous Editions

DeepTest 2024 was co-located with ICSE 2024
DeepTest 2023 was co-located with ICSE 2023
DeepTest 2021 was co-located with ICSE 2021
DeepTest 2020 was co-located with ICSE 2020
DeepTest 2019 was co-located with ICSE 2019

The workshop is partially supported by the EU project Sec4AI4Sec.

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

You're viewing the program in a time zone which is different from your device's time zone change time zone

Sat 3 May
Displayed time zone: Eastern Time (US & Canada) change

	07:00 - 17:00	Ready Room SaturdaySocial, Networking and Special Rooms at 209

09:00 - 10:30	Opening and Keynote SessionDeepTest at 213 Chair(s): Jinhan Kim Università della Svizzera italiana (USI)

09:00 10m Day opening		Opening DeepTest
09:10 80m Keynote		Failures or False Alarms? Validating Tests and Failures for Cyber Physical Systems DeepTest Shiva Nejati University of Ottawa

10:30 - 11:00	BreakCatering at Canada Hall 3 plus Foyer

10:30 30m Break		Saturday Morning Break Catering

11:00 - 12:30	Paper Presentation 1DeepTest at 213 Chair(s): Jinhan Kim Università della Svizzera italiana (USI)

11:00 30m Talk		Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths DeepTest Naryeong Kim Korea Advanced Institute of Science and Technology, Sungmin Kang KAIST, Gabin An KAIST, Shin Yoo KAIST Pre-print
11:30 30m Talk		Improving the Reliability of Failure Prediction Models through Concept Drift Monitoring DeepTest Lorena Poenaru-Olaru TU Delft, Luís Cruz TU Delft, Jan S. Rellermeyer Leibniz University Hannover, Arie van Deursen TU Delft
12:00 30m Talk		On the Effectiveness of LLMs for Manual Test Verifications DeepTest Myron David Peixoto Federal University of Alagoas, Davy Baía Federal University of Alagoas, Nathalia Nascimento Pennsylvania State University, Paulo Alencar University of Waterloo, Baldoino Fonseca Federal University of Alagoas, Márcio Ribeiro Federal University of Alagoas, Brazil

12:30 - 14:00	LunchCatering at Canada Hall 3 plus Foyer

13:15 45m Lunch		Saturday Lunch Catering

14:00 - 15:30	Paper Presentation 2DeepTest at 213 Chair(s): Matteo Biagiola Università della Svizzera italiana

14:00 30m Talk		DANDI: Diffusion as Normative Distribution for Deep Neural Network Input DeepTest Somin Kim Korea Advanced Institute of Science and Technology, Shin Yoo KAIST Pre-print
14:30 30m Talk		Robust Testing for Deep Learning using Human Label Noise DeepTest Gordon Lim University of Michigan, Stefan Larson Vanderbilt University, Kevin Leach Vanderbilt University Pre-print
15:00 30m Talk		DILLEMA: Diffusion and Large Language Models for Multi-Modal Augmentation DeepTest Luciano Baresi Politecnico di Milano, Davide Yi Xian Hu Politecnico di Milano, Muhammad Irfan Mas'Udi Politecnico di Milano, Giovanni Quattrocchi Politecnico di Milano

15:30 - 16:00	BreakCatering at Canada Hall 3 plus Foyer

15:30 30m Break		Saturday Afternoon Break Catering

16:00 - 17:30	Paper Presentation 3DeepTest at 213 Chair(s): Matteo Biagiola Università della Svizzera italiana

16:00 30m Talk		OpenCat: Improving Interoperability of ADS Testing DeepTest Qurban Ali University of Milano-Bicocca, Andrea Stocco Technical University of Munich, fortiss, Leonardo Mariani University of Milano-Bicocca, Oliviero Riganelli University of Milano - Bicocca Pre-print
16:30 30m Talk		Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation DeepTest Benjamin Steenhoek Microsoft, Michele Tufano Google, Neel Sundaresan Microsoft, Alexey Svyatkovskiy Google DeepMind

Accepted Papers

	Title
	DANDI: Diffusion as Normative Distribution for Deep Neural Network Input DeepTest Somin Kim, Shin Yoo Pre-print
	DILLEMA: Diffusion and Large Language Models for Multi-Modal Augmentation DeepTest Luciano Baresi, Davide Yi Xian Hu, Muhammad Irfan Mas'Udi, Giovanni Quattrocchi
	Improving the Reliability of Failure Prediction Models through Concept Drift Monitoring DeepTest Lorena Poenaru-Olaru, Luís Cruz, Jan S. Rellermeyer, Arie van Deursen
	Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths DeepTest Naryeong Kim, Sungmin Kang, Gabin An, Shin Yoo Pre-print
	On the Effectiveness of LLMs for Manual Test Verifications DeepTest Myron David Peixoto, Davy Baía, Nathalia Nascimento, Paulo Alencar, Baldoino Fonseca, Márcio Ribeiro
	OpenCat: Improving Interoperability of ADS Testing DeepTest Qurban Ali, Andrea Stocco, Leonardo Mariani, Oliviero Riganelli Pre-print
	Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation DeepTest Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy
	Robust Testing for Deep Learning using Human Label Noise DeepTest Gordon Lim, Stefan Larson, Kevin Leach Pre-print

Call for Papers

DeepTest is an interdisciplinary workshop targeting research at the intersection of software engineering and deep learning. This workshop will explore issues related to:

Deep Learning applied to Software Engineering (DL4SE)
Software Engineering applied to Deep Learning (SE4DL)

Although the main focus is on Deep Learning, we also encourage submissions that are more broadly related to Machine Learning.

Topics of Interest

We welcome submissions introducing technology (i.e., frameworks, libraries, program analyses and tool evaluation) for testing DL-based applications, and DL-based solutions to solve open research problems (e.g., what is a bug in a DL/RL model). Relevant topics include, but are not limited to:

High-quality benchmarks for evaluating DL/RL approaches
Surveys and case studies using DL/RL technology
Techniques to aid interpretable DL/RL techniques
Techniques to improve the design of reliable DL/RL models
DL/RL-aided software development approaches
DL/RL for fault prediction, localization and repair
Fuzzing DL/RL systems
Metamorphic testing as software quality assurance
Fault Localization and Anomaly Detection
Use of DL for analyzing natural language-like artefacts such as code, or user reviews
DL/RL techniques to support automated software testing
DL/RL to aid program comprehension, program transformation, and program generation
Safety and security of DL/RL based systems
New approaches to estimate and measure uncertainty in DL/RL models

Types of Submissions

We accept two types of submissions:

Full research papers: up to 8-page papers (including references) describing original and unpublished results related to the workshop topics;
Short papers up: to 4-page papers (including references) describing preliminary work, new insights in previous work, or demonstrations of testing-related tools and prototypes.

All submissions must conform to the ICSE 2025 formatting instructions. All submissions must be in PDF. The page limit is strict. Submissions must conform to the IEEE conference proceedings template, specified in the IEEE Conference Proceedings Formatting Guidelines.

DeepTest 2025 will employ a double-blind review process. Thus, no submission may reveal its authors’ identities. The authors must make every effort to honor the double-blind review process. In particular, the authors’ names must be omitted from the submission, and references to their prior work should be in the third person.

If you have any questions or wonder whether your submission is in scope, please do not hesitate to contact the organizers.

Submission Site

https://easychair.org/my/conference?conf=deeptest2025

Speaker: Prof. Shiva Nejati

Title: Failures or False Alarms? Validating Tests and Failures for Cyber Physical Systems

Abstract:

Software testing is about detecting failures, but not every failure necessarily indicates a genuine fault in the system under test. Some failures are spurious – caused by invalid test inputs or by flaky test outputs. While these spurious failures can arise in many contexts, they are especially prevalent in deep learning–enabled and cyber-physical systems, which often operate autonomously in complex, unpredictable environments. In such systems, virtually any environmental factor can act as an input, and the system’s internal decision-making – driven by deep learning models – may be nondeterministic or difficult to interpret. In this talk, I will discuss three solutions to ensure the validity of test inputs and increase the robustness of test outputs: (1) Generating interpretable constraints that characterize valid test inputs, (2) Developing effort-optimized, human-assisted methods for validating test inputs, and (3) Employing generative models to improve the consistency of test results across different simulation environments. I will illustrate these solutions through case studies from cyber-physical systems, autonomous driving, and network systems.

Bio:

Shiva Nejati is a Professor at the School of Electrical Engineering and Computer Science at the University of Ottawa, Canada. She co-founded and co-directs the university’s Sedna IoT Lab. Her current research is in software engineering, focusing on software testing, the analysis of IoT and cyber-physical systems, search-based software engineering, model-driven engineering, applied machine learning, and formal and empirical software engineering methods. Her research is closely connected to industry, having worked with 16 companies across Canada, Europe, Asia, and the US. These collaborations have resulted in over 90 published papers in top-tier venues, earning her eight Best Paper and ACM SIGSOFT Distinguished Paper Awards, as well as a 10-Year Most Influential Paper Award. She has served as a Program Co-Chair for four international conferences: SEAMS 2025, ICST 2024, MODELS 2021, and SSBSE 2019, and she will serve in the same role for ASE 2026. She is an elected at-large member of the IEEE TCSE Executive Committee and was an Associate Editor for IEEE TSE from 2019 to 2024.

DeepTest 2025

Program Display Configuration

Sat 3 MayDisplayed time zone: Eastern Time (US & Canada) change

Accepted Papers

Call for Papers

Keynote

Matteo Biagiola

Università della Svizzera italiana

Nicolás Cardozo

Universidad de los Andes

Colombia

Foutse Khomh

Polytechnique Montréal

Canada

Jinhan Kim

Università della Svizzera italiana (USI)

Switzerland

Alessandro Marchetto

Università di Trento

Italy

Manel Abdellatif

École de Technologie Supérieure

Canada

Aren Babikian

University of Toronto

Swaroopa Dola

University of Virginia

Luca Giamattei

Università di Napoli Federico II

Sungmin Kang

KAIST

South Korea

Noor Nashid

University of British Columbia

Canada

Amin Nikanjam

École Polytechnique de Montréal

Canada

Arumoy Shome

Delft University of Technology

Netherlands

Fitash Ul Haq

Luxembourg Institute of Science and Technology

Luxembourg

Yuanyuan Yuan

ETH Zurich

Sat 3 May
Displayed time zone: Eastern Time (US & Canada) change