Audee: Automated Testing for Deep Learning Frameworks (ASE 2020 - Research Papers)

Who

Qianyu Guo, Xiaofei Xie, Yi Li, Xiaoyu Zhang, Yang Liu, Xiaohong Li, Chao Shen

Track

ASE 2020 Research Papers

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 23 Sep 2020 01:10 - 01:30 at Kangaroo - Software Engineering for AI (2) Chair(s): Aldeida Aleti

Abstract

Deep learning (DL) has been applied widely, and the quality of DL system becomes crucial, especially for safety-critical applications. Existing work mainly focuses on the quality analysis of DL models, but lacks attention to the underlying libraries and frameworks on which all DL models depend. In this work, we propose Audee, anovel approach for testing DL libraries and localizing bugs. Audee adopts a search-based approach and implements three different mutation strategies to generate diverse tests cases by exploring combinations of model structures, parameters, weights and inputs. Audee is able to detect three types of bugs: logic bugs, crashes and Not-a-Number (NaN) bugs. In particular, for logic bugs, Audee adopts a cross-reference check to detect behavioral inconsistencies across multiple frameworks (e.g., TensorFlow and PyTorch), which indicates potential bugs in their implementations. For NaN bugs, Audee adopts a heuristic-based approach to generate DNNs that tend to output outliers (i.e., too large or small values), and these values are likely to cause NaN value. Furthermore, Audee leverages causal testing based technique to localize layers as well as parameters that cause inconsistencies or bugs. To evaluate the effectiveness of our approach, we applied Audeeon evaluating four DL frameworks, i.e., TensorFlow, CNTK, Theano, and PyTorch. We totally generate 260 models which cover 25 widely-used APIs in the four frameworks. The results demonstrate Audee are effective indetecting inconsistencies, crashes and NaN bugs. In total, 26 unique unknown bugs were discovered, and seven of them have already been confirmed by the developers.

Qianyu Guo

College of Intelligence and Computing, Tianjin University

China

Xiaofei Xie

Nanyang Technological University

Yi Li

Nanyang Technological University

Singapore

Xiaoyu Zhang

Xi'an Jiaotong University

Yang Liu

Nanyang Technological University, Singapore

Singapore

Xiaohong Li

TianJin University