DeepGini: Prioritizing Massive Tests to Enhance the Robustness of Deep Neural Networks (ISSTA 2020 - Technical Papers)

Who

Yang Feng, Qingkai Shi, Xinyu Gao, Muhammed Kerem Kahraman, Chunrong Fang, Zhenyu Chen

Track

ISSTA 2020 Technical Papers

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 20 Jul 2020 16:50 - 17:10 at Zoom - MACHINE LEARNING I Chair(s): Divya Gopinath

Abstract

Deep neural networks have been deployed in many software systems to assist in various classification tasks. In the company with fantastic effectiveness in classification, DNNs could also exhibit incorrect behaviors and result in accidents and losses. Therefore, testing techniques that can detect incorrect DNN behaviors and improve DNN quality are extremely necessary and critical. However, the testing oracle, which defines the correct output for a given input, is often not available in the automated testing. To obtain the oracle information, the testing tasks of DNN-based systems usually require expensive human efforts to label the testing data, which significantly slows down the process of quality assurance.

To mitigate this problem, we propose DeepGini, a test prioritization technique designed based on a statistical perspective of DNN. Such a statistical perspective allows us to reduce the problem of measuring misclassification probability to the problem of measuring set impurity. DeepGini allows us to identify possibly-misclassified tests quickly. These tests are very useful in improving the robustness of DNNs. To evaluate our technique, we conduct an extensive empirical study on popular datasets and prevalent DNN models. The experiment results demonstrate that DeepGini outperforms the existing coverage-based techniques in prioritizing test cases regarding both effectiveness and efficiency. In addition, we observe that the tests prioritized at the front by DeepGini are more effective in improving the DNN quality in comparison with the coverage-based techniques.

DOI

https://doi.org/10.1145/3395363.3397357