One Step Further: Evaluating Interpreters Using Metamorphic Testing
Wed 20 Jul 2022 18:00 - 18:20 at ISSTA 1 - Session 3-3: Test Generation and Mutation C Chair(s): Stefan Winter
The black-box nature of the Deep Neural Network (DNN) makes it difficult for people to understand why it makes a specific decision, which restricts its applications in critical tasks. Recently, many interpreters (interpretation methods) are proposed to improve the transparency of DNNs by providing relevant features in the form of a saliency map. However, different interpreters might provide different interpretation results for the same classification case, which motivates us to conduct the robustness evaluation of interpreters.
However, the biggest challenge of evaluating interpreters is the testing oracle problem, i.e., hard to label ground-truth interpretation results. To fill this critical gap, we first use the images with bounding boxes in the object detection system and the images inserted with backdoor triggers as our original ground-truth dataset. Then, we apply metamorphic testing to extend the dataset by three operators, including inserting an object, deleting an object, and feature squeezing the image background. Our key intuition is that after the three operations which do not modify the primary detected objects, the interpretation results should not change for good interpreters. Finally, we measure the qualities of interpretation results quantitatively with the Intersection-over-Minimum (IoMin) score and evaluate interpreters based on the statistics of metamorphic relation’s failures.
We evaluate seven popular interpreters on 877,324 metamorphic images in diverse scenes. The results show that our approach can quantitatively evaluate interpreters’ robustness, where Grad-CAM provides the most reliable interpretation results among the seven interpreters.
Wed 20 JulDisplayed time zone: Seoul change
01:20 - 02:20 | Session 1-2: Test Generation and Mutation ATechnical Papers at ISSTA 2 Chair(s): Raghavan Komondoor IISc Bengaluru | ||
01:20 20mTalk | On the Use of Mutation Analysis For Evaluating Student Test Suite Quality Technical Papers James Perretta Northeastern University, Andrew DeOrio University of Michigan, Arjun Guha Northeastern University, Jonathan Bell Northeastern University DOI | ||
01:40 20mTalk | Automated Test Generation for REST APIs: No Time to Rest Yet Technical Papers DOI | ||
02:00 20mTalk | One Step Further: Evaluating Interpreters Using Metamorphic Testing Technical Papers Ming Fan Xi'an Jiaotong University, Jiali Wei Xi'an Jiaotong University, Wuxia Jin Xi'an Jiaotong University, Zhou Xu Wuhan University, Wenying Wei Xi'an Jiaotong University, Ting Liu Xi'an Jiaotong University DOI |
18:00 - 19:00 | Session 3-3: Test Generation and Mutation CTechnical Papers at ISSTA 1 Chair(s): Stefan Winter LMU Munich | ||
18:00 20mTalk | One Step Further: Evaluating Interpreters Using Metamorphic Testing Technical Papers Ming Fan Xi'an Jiaotong University, Jiali Wei Xi'an Jiaotong University, Wuxia Jin Xi'an Jiaotong University, Zhou Xu Wuhan University, Wenying Wei Xi'an Jiaotong University, Ting Liu Xi'an Jiaotong University DOI | ||
18:20 20mTalk | Test Mimicry to Assess the Exploitability of Library Vulnerabilities Technical Papers Hong Jin Kang Singapore Management University, Singapore, Truong Giang Nguyen School of Computing and Information Systems, Singapore Management University, Xuan Bach D. Le The University of Melbourne, Corina S. Pasareanu Carnegie Mellon University Silicon Valley, NASA Ames Research Center, David Lo Singapore Management University DOI | ||
18:40 20mTalk | RegMiner: Towards Constructing a Large Regression Dataset from Code Evolution History Technical Papers Xuezhi Song Fudan University, Yun Lin National University of Singapore, Siang Hwee Ng National University of Singapore, Yijian Wu Fudan University, Xin Peng Fudan University, Jin Song Dong National University of Singapore, Hong Mei Peking University DOI Pre-print |