TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural Networks (FSE 2025 - Journal First)

Who

Amin Abbasishahkoo, Mahboubeh Dadkhah, Lionel Briand, Dayi Lin

Track

FSE 2025 Journal First

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 23 Jun 2025 16:20 - 16:40 at Cosmos 3A - Testing 2 Chair(s): Miryung Kim

Abstract

Successful deployment of Deep Neural Networks (DNNs), particularly in safety-critical systems, requires their validation with an adequate test set to ensure a sufficient degree of confidence in test outcomes. Although well-established test adequacy assessment techniques from traditional software, such as mutation analysis and coverage criteria, have been adapted to DNNs in recent years, we still need to investigate their application within a comprehensive methodology for accurately predicting the fault detection ability of test sets and thus assessing their adequacy. In this paper, we propose and evaluate TEASMA, a comprehensive and practical methodology designed to accurately assess the adequacy of test sets for DNNs. In practice, TEASMA allows engineers to decide whether they can trust high-accuracy test results and thus validate the DNN before its deployment. Based on a DNN model’s training set, TEASMA provides a procedure to build accurate DNN-specific prediction models of the Fault Detection Rate (FDR) of a test set using an existing adequacy metric, thus enabling its assessment. We evaluated TEASMA with four state-of-the-art test adequacy metrics: Distance-based Surprise Coverage (DSC), Likelihood-based Surprise Coverage (LSC), Input Distribution Coverage (IDC), and Mutation Score (MS). We calculated MS based on mutation operators that directly modify the trained DNN model (i.e., post-training operators) due to their significant computational advantage compared to the operators that modify the DNN’s training set or program (i.e., pre-training operators). Our extensive empirical evaluation, conducted across multiple DNN models and input sets, including large input sets such as ImageNet, reveals a strong linear correlation between the predicted and actual FDR values derived from MS, DSC, and IDC, with minimum R² values of 0.94 for MS and 0.90 for DSC and IDC. Furthermore, a low average Root Mean Square Error (RMSE) of 9% between actual and predicted FDR values across all subjects, when relying on regression analysis and MS, demonstrates the latter’s superior accuracy when compared to DSC and IDC, with RMSE values of 0.17 and 0.18, respectively. Overall, these results suggest that TEASMA provides a reliable basis for confidently deciding whether to trust test results for DNN models.

Amin Abbasishahkoo

The School of EECS, University of Ottawa

Mahboubeh Dadkhah

University of Ottawa

Lionel Briand

University of Ottawa, Canada; Lero centre, University of Limerick, Ireland

Canada

Dayi Lin

Centre for Software Excellence, Huawei Canada

Canada

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 23 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

16:00 - 18:00	Testing 2Journal First / Research Papers at Cosmos 3A Chair(s): Miryung Kim UCLA and Amazon Web Services

16:00 20m Talk		Search-based DNN Testing and Retraining with GAN-enhanced Simulations Journal First Mohammed Attaoui University of Luxembourg, Fabrizio Pastore University of Luxembourg, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland
16:20 20m Talk		TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural Networks Journal First Amin Abbasishahkoo The School of EECS, University of Ottawa, Mahboubeh Dadkhah University of Ottawa, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland, Dayi Lin Centre for Software Excellence, Huawei Canada
16:40 20m Talk		VLATest: Testing and Evaluating Vision-Language-Action Models for Robotic Manipulation Research Papers Zhijie Wang University of Alberta, Zhehua Zhou University of Macau, Norman Song , Yuheng Huang The University of Tokyo, Zhan Shu University of Alberta, Lei Ma The University of Tokyo & University of Alberta DOI Pre-print
17:00 20m Talk		DRWASI: LLM-assisted Differential Testing for WebAssembly System Interface Implementations Journal First Yixuan Zhang Peking University, Ningyu He Hong Kong Polytechnic University, Jianting Gao Huazhong University of Science and Technology, Shangtong Cao Beijing University of Posts and Telecommunications, Kaibo Liu Peking University, Haoyu Wang Huazhong University of Science and Technology, Yun Ma Peking University, Gang Huang Peking University, Xuanzhe Liu Peking University
17:20 20m Talk		MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases Journal First Congying Xu The Hong Kong University of Science and Technology, China, Valerio Terragni University of Auckland, Hengcheng Zhu The Hong Kong University of Science and Technology, Jiarong Wu , Shing-Chi Cheung Hong Kong University of Science and Technology
17:40 20m Talk		UnitCon: Synthesizing Targeted Unit Tests for Java Runtime Exceptions Research Papers Sujin Jang KAIST, Yeonhee Ryou KAIST, Heewon Lee KAIST, Korea, South (The Republic of), Kihong Heo KAIST DOI

Information for Participants

Mon 23 Jun 2025 16:00 - 18:00 at Cosmos 3A - Testing 2 Chair(s): Miryung Kim

Info for room Cosmos 3A:

Cosmos 3A is the first room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.