ICSE 2024
Fri 12 - Sun 21 April 2024 Lisbon, Portugal
Fri 19 Apr 2024 16:30 - 16:45 at Sophia de Mello Breyner Andresen - Testing of AI systems Chair(s): Aldeida Aleti

Deep Neural Networks (DNN) are core components for classification and regression tasks of many software systems. Companies incur in high costs for testing DNN with datasets representative of the inputs expected in operation, as these need to be manually labelled. The challenge is to select a representative set of test inputs as small as possible to reduce the labelling cost, while sufficing to yield unbiased high-confidence estimates of the expected DNN accuracy. At the same time, testers are interested in exposing as many DNN mispredictions as possible to improve the DNN, ending up in the need for techniques pursuing a threefold aim: small dataset size, trustworthy estimates, mispredictions exposure.

This study presents DeepSample, a family of DNN testing techniques for cost-effective accuracy assessment based on probabilistic sampling. We investigate whether, to what extent, and under which conditions probabilistic sampling can help to tackle the outlined challenge. We implement five new sampling-based testing techniques, and perform a comprehensive comparison of such techniques and of three further state-of-the-art techniques for both DNN classification and regression tasks. Results serve as guidance for best use of sampling-based testing for faithful and high-confidence estimates of DNN accuracy in operation at low cost.

Fri 19 Apr

Displayed time zone: Lisbon change

16:00 - 17:30
Testing of AI systemsResearch Track / Journal-first Papers at Sophia de Mello Breyner Andresen
Chair(s): Aldeida Aleti Monash University
16:00
15m
Talk
CIT4DNN: Generating Diverse and Rare Inputs for Neural Networks Using Latent Space Combinatorial Testing
Research Track
Swaroopa Dola University of Virginia, Rory McDaniel University of Virginia, Matthew B Dwyer University of Virginia, Mary Lou Soffa University of Virginia
16:15
15m
Talk
Knowledge Graph Driven Inference Testing for Question Answering Software
Research Track
Jun Wang Nanjing University, Yanhui Li Nanjing University, Zhifei Chen Nanjing University, Lin Chen Nanjing University, Xiaofang Zhang Soochow University, Yuming Zhou Nanjing University
16:30
15m
Talk
DeepSample: DNN sampling-based testing for operational accuracy assessment
Research Track
Antonio Guerriero Università di Napoli Federico II, Roberto Pietrantuono Università di Napoli Federico II, Stefano Russo Università di Napoli Federico II
Pre-print
16:45
15m
Talk
MAFT: Efficient Model-Agnostic Fairness Testing for Deep Neural Networks via Zero-Order Gradient Search
Research Track
Zhaohui Wang East China Normal University, Min Zhang East China Normal University, Jingran Yang East China Normal University, ShaoBojie East China Normal University, Min Zhang East China Normal University
17:00
7m
Talk
DeepManeuver: Adversarial Test Generation for Trajectory Manipulation of Autonomous Vehicles
Journal-first Papers
Meriel von Stein University of Virginia, Sebastian Elbaum University of Virginia, David Shriver Software Engineering Institute
17:07
7m
Talk
Finding Deviated Behaviors of the Compressed DNN Models for Image Classifications
Journal-first Papers
Yongqiang Tian The Hong Kong University of Science and Technology; University of Waterloo, Wuqi Zhang The Hong Kong University of Science and Technology, Ming Wen Huazhong University of Science and Technology, Shing-Chi Cheung Hong Kong University of Science and Technology, Chengnian Sun University of Waterloo, Shiqing Ma University of Massachusetts, Amherst, Yu Jiang Tsinghua University
Link to publication DOI
17:14
7m
Talk
Identifying the Hazard Boundary of ML-enabled Autonomous Systems Using Cooperative Co-Evolutionary Search
Journal-first Papers
Sepehr Sharifi University of Ottawa, Donghwan Shin University of Sheffield, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland, Nathan Aschbacher Auxon Corporation