To What Extent Do DNN-based Image Classification Models Make Unreliable Inferences? (ICSE 2022 - Journal-First Papers)

Write a Blog >>

Sun 8 - Fri 27 May 2022

Who

Yongqiang Tian, Shiqing Ma, Ming Wen, Yepang Liu, Shing-Chi Cheung, Xiangyu Zhang

Track

ICSE 2022 Journal-First Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 10 May 2022 22:05 - 22:10 at ICSE room 5 - Software Testing 6 Chair(s): Leonardo Sousa
Thu 12 May 2022 12:00 - 12:05 at ICSE room 3 - Software Testing 14 Chair(s): Brittany Johnson

Abstract

Deep Neural Network (DNN) models are widely used for image classification. While they offer high performance in terms of accuracy, researchers are concerned about if these models inappropriately make inferences using features irrelevant to the target object in a given image. To address this concern, we propose a metamorphic testing approach that assesses if a given inference is made based on irrelevant features. Specifically, we propose two metamorphic relations (MRs) to detect such unreliable inferences. These relations expect (a) the classification results with different labels or the same labels but less certainty from models after corrupting the relevant features of images, and (b) the classification results with the same labels after corrupting irrelevant features. The inferences that violate the metamorphic relations are regarded as unreliable inferences. Our evaluation demonstrated that our approach can effectively identify unreliable inferences for single-label classification models with an average precision of 64.1% and 96.4% for the two MRs, respectively. As for multi-label classification models, the corresponding precision for MR-1 and MR-2 is 78.2% and 86.5%, respectively. Further, we conducted an empirical study to understand the problem of unreliable inferences in practice. Specifically, we applied our approach to 18 pre-trained single-label image classification models and 3 multi-label classification models, and then examined their inferences on the ImageNet and COCO datasets. We found that unreliable inferences are pervasive. Specifically, for each model, more than thousands of correct classifications are actually made using irrelevant features. Next, we investigated the effect of such pervasive unreliable inferences, and found that they can cause significant degradation of a model’s overall accuracy. After including these unreliable inferences from the test set, the model’s accuracy can be significantly changed. Therefore, we recommend that developers should pay more attention to these unreliable inferences during the model evaluations. We also explored the correlation between model accuracy and the size of unreliable inferences. We found the inferences of the input with smaller objects are easier to be unreliable. Lastly, we found that the current model training methodologies can guide the models to learn object-relevant features to certain extent, but may not necessarily prevent the model from making unreliable inferences. We encourage the community to propose more effective training methodologies to address this issue.

Link to Preprint

https://repository.ust.hk/ir/Record/1783.1-111121

DOI

https://doi.org/10.1007/s10664-021-09985-1

Yongqiang Tian

The Hong Kong University of Science and Technology; University of Waterloo

Shiqing Ma

Rutgers University

Ming Wen

Huazhong University of Science and Technology

China

Yepang Liu

Southern University of Science and Technology

China

Shing-Chi Cheung

Hong Kong University of Science and Technology

China

Xiangyu Zhang

Purdue University

United States

To What Extent Do DNN-based Image Classification Models Make Unreliable Inferences?

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 10 May
Displayed time zone: Eastern Time (US & Canada) change

22:00 - 23:00	Software Testing 6SEIP - Software Engineering in Practice / Technical Track / Journal-First Papers at ICSE room 5 Chair(s): Leonardo Sousa

5m Talk		Algorithmic Profiling for Real-World Complexity Problems Journal-First Papers Boqin Qin China Telecom Cloud Computing Corporation, Tengfei Tu Beijing University of Posts and Telecommunications, Ziheng Liu University of California, San Diego, Tingting Yu University of Cincinnati, Linhai Song Pennsylvania State University, USA DOI Pre-print Media Attached
5m Talk		To What Extent Do DNN-based Image Classification Models Make Unreliable Inferences? Journal-First Papers Yongqiang Tian The Hong Kong University of Science and Technology; University of Waterloo, Shiqing Ma Rutgers University, Ming Wen Huazhong University of Science and Technology, Yepang Liu Southern University of Science and Technology, Shing-Chi Cheung Hong Kong University of Science and Technology, Xiangyu Zhang Purdue University DOI Pre-print Media Attached
5m Talk		Testing Machine Learning Systems in Industry: An Empirical Study SEIP - Software Engineering in Practice Shuyue Li Xi'an Jiaotong University, Jiaqi Guo Xi'an Jiaotong University, Jian-Guang Lou Microsoft Research, Ming Fan Xi'an Jiaotong University, Ting Liu Xi'an Jiaotong University, Dongmei Zhang Microsoft Research DOI Pre-print Media Attached
5m Talk		R2Z2: Detecting Rendering Regressions in Web Browsers through Differential Fuzz Testing Technical Track Suhwan Song Seoul National University, South Korea, Jaewon Hur Seoul National University, Sunwoo Kim Samsung Research, Samsung Electronics, Philip Rogers Google, Byoungyoung Lee Seoul National University, South Korea Pre-print Media Attached
5m Talk		Fuzzing Class Specifications Technical Track Facundo Molina University of Rio Cuarto and CONICET, Argentina, Marcelo d'Amorim Federal University of Pernambuco, Nazareno Aguirre University of Rio Cuarto and CONICET, Argentina Pre-print Media Attached
5m Talk		GIFdroid: Automated Replay of Visual Bug Reports for Android Apps Technical Track Sidong Feng Monash University, Chunyang Chen Monash University DOI Pre-print Media Attached

Thu 12 May
Displayed time zone: Eastern Time (US & Canada) change

12:00 - 13:00	Software Testing 14Technical Track / Journal-First Papers / SEIP - Software Engineering in Practice at ICSE room 3 Chair(s): Brittany Johnson George Mason University

5m Talk		To What Extent Do DNN-based Image Classification Models Make Unreliable Inferences? Journal-First Papers Yongqiang Tian The Hong Kong University of Science and Technology; University of Waterloo, Shiqing Ma Rutgers University, Ming Wen Huazhong University of Science and Technology, Yepang Liu Southern University of Science and Technology, Shing-Chi Cheung Hong Kong University of Science and Technology, Xiangyu Zhang Purdue University DOI Pre-print Media Attached
5m Talk		Demystifying the Challenges and Benefits of Analyzing User-Reported Logs in Bug Reports Journal-First Papers An Ran Chen Concordia University, Tse-Hsun (Peter) Chen Concordia University, Shaowei Wang University of Manitoba Link to publication Media Attached
5m Talk		Surveying the Developer Experience of Flaky Tests SEIP - Software Engineering in Practice Owain Parry The University of Sheffield, Gregory Kapfhammer Allegheny College, Michael Hilton Carnegie Mellon University, USA, Phil McMinn University of Sheffield Pre-print Media Attached
5m Talk		Fuzzing Class Specifications Technical Track Facundo Molina University of Rio Cuarto and CONICET, Argentina, Marcelo d'Amorim Federal University of Pernambuco, Nazareno Aguirre University of Rio Cuarto and CONICET, Argentina Pre-print Media Attached
5m Talk		Demystifying the Dependency Challenge in Kernel Fuzzing Technical Track Yu Hao University of California at Riverside, USA, Hang Zhang Georgia Institute of Technology, Guoren Li UC Riverside, Xingyun Du UC Riverside, Zhiyun Qian University of California at Riverside, USA, Ardalan Amiri Sani UC Irvine Pre-print Media Attached
5m Talk		Natural Attack for Pre-trained Models of Code Technical Track Zhou Yang Singapore Management University, Jieke Shi Singapore Management University, Junda He Singapore Management University, David Lo Singapore Management University DOI Pre-print Media Attached

Information for Participants

Tue 10 May 2022 22:00 - 23:00 at ICSE room 5 - Software Testing 6 Chair(s): Leonardo Sousa

Info for room ICSE room 5-even hours:

Click here to go to the room on Midspace

Thu 12 May 2022 12:00 - 13:00 at ICSE room 3 - Software Testing 14 Chair(s): Brittany Johnson

Info for room ICSE room 3-even hours:

Click here to go to the room on Midspace