TCSE logo 
 Sigsoft logo
Sustainability badge

This program is tentative and subject to change.

Sat 3 May 2025 14:00 - 14:30 at 213 - Paper Presentation 2

Surprise Adequacy (SA) has been widely studied as a test adequacy metric that can effectively guide software engineers towards inputs that are more likely to reveal unexpected behaviour of Deep Neural Networks (DNNs). Intuitively, SA is an out-of-distribution metric that quantifies the dissimilarity between the given input and the training data: if a new input is very different from those seen during training, the DNN is more likely to behave unexpectedly against the input. While SA has been widely adopted as a test prioritization method, its major weakness is the fact that the computation of the metric requires access to the training dataset, which is often not allowed in real-world use cases. We present DANDI, a technique that generates a surrogate input distribution using Stable Diffusion to compute SA values without requiring the original training data. An empirical evaluation of DANDI applied to image classifiers for CIFAR10 and ImageNet-1K shows that SA values computed against synthetic data are highly correlated with the values computed against the training data, with Spearman Rank correlation value of 0.852 for ImageNet-1K and 0.881 for CIFAR-10. Further, we show that SA value computed by DANDI achieves can prioritize inputs as effectively as those computed using the training data, when testing DNN models mutated by DeepCrime. We believe that DANDI can significantly improve the usability of SA for practical DNN testing.

This program is tentative and subject to change.

Sat 3 May

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
Paper Presentation 2DeepTest at 213
14:00
30m
Talk
DANDI: Diffusion as Normative Distribution for Deep Neural Network Input
DeepTest
Somin Kim Korea Advanced Institute of Science and Technology, Shin Yoo Korea Advanced Institute of Science and Technology
14:30
30m
Talk
Robust Testing for Deep Learning using Human Label Noise
DeepTest
Yi Yang Gordon Lim University of Michigan, Stefan Larson Vanderbilt University, Kevin Leach Vanderbilt University
15:00
30m
Talk
Improving the Reliability of Failure Prediction Models through Concept Drift Monitoring
DeepTest
Lorena Poenaru-Olaru TU Delft, Luís Cruz TU Delft, Jan S. Rellermeyer Leibniz University Hannover, Arie van Deursen TU Delft
:
:
:
: