Testing Machine Learning Systems in Industry: An Empirical Study (ICSE 2022 - SEIP - Software Engineering in Practice)

Write a Blog >>

Sun 8 - Fri 27 May 2022

Who

Shuyue Li, Jiaqi Guo, Jian-Guang Lou, Ming Fan, Ting Liu, Dongmei Zhang

Track

ICSE 2022 SEIP - Software Engineering in Practice

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 10 May 2022 05:10 - 05:15 at ICSE room 3-odd hours - Software Testing 2 Chair(s): Aldeida Aleti
Tue 10 May 2022 22:10 - 22:15 at ICSE room 5-even hours - Software Testing 6 Chair(s): Leonardo Sousa

Abstract

Machine learning (ML) becomes increasingly prevalent, being integrated into a wide range of software systems. These systems, named ML systems, must be adequately tested to gain confidence that they behave correctly. Although many research efforts have been devoted to testing technologies for ML systems, the industrial teams are faced with some new challenges of testing the ML systems in real-world settings. To absorb inspirations from the industry on the problems in ML testing, we conducted an empirical study including a survey with 87 responses and interviews with 7 senior practitioners working on different ML systems from well-known IT companies. Our study uncovers significant industrial concerns on major testing activities, i.e., test data collection, test execution, and test result analysis, and also some good practices and open challenges from the perspective of the industry. \textbf{(1) Test data collection} is conducted in different ways on ML model, data, and code and faced with different challenges. \textbf{(2) Test execution} in ML systems suffers from two major problems: entanglement among the components and the regression on model performance. \textbf{(3) Test result analysis} centers on quantitative methods, e.g., metric-based evaluation, and is also combined with some qualitative methods based on practitioners’ experience. Based on our findings, we highlight the research opportunities and also provide some implications for practitioners.

Link to Preprint

https://drive.google.com/file/d/1wrG1x4YJbfnf5dT9mvvJ9SCVOt1-kynU/view?usp=sharing

DOI

https://doi.org/10.1145/3510457.3513036

Shuyue Li

Xi'an Jiaotong University

China

Jiaqi Guo

Xi'an Jiaotong University

Jian-Guang Lou

Microsoft Research

China

Ming Fan

Xi'an Jiaotong University

Ting Liu

Xi'an Jiaotong University

China

Dongmei Zhang

Microsoft Research

China

video presentation

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 10 May
Displayed time zone: Eastern Time (US & Canada) change

05:00 - 06:00	Software Testing 2SEIP - Software Engineering in Practice / Technical Track / Journal-First Papers at ICSE room 3-odd hours Chair(s): Aldeida Aleti Monash University

05:00 5m Talk		Reinforcement Learning for Test Case Prioritization Journal-First Papers Mojtaba Bagherzadeh University of Ottawa, Nafiseh Kahani , Lionel Briand University of Luxembourg; University of Ottawa Link to publication DOI Pre-print Media Attached
05:05 5m Talk		Build System Aware Multi-language Regression Test Selection in Continuous Integration SEIP - Software Engineering in Practice Daniel Elsner TU Munich, Roland Würsching Technical University of Munich, Markus Schnappinger , Alexander Pretschner TU Munich, Maria Graber IVU Traffic Technologies, René Dammer IVU Traffic Technologies, Silke Reimer IVU Traffic Technologies DOI Pre-print Media Attached
05:10 5m Talk		Testing Machine Learning Systems in Industry: An Empirical Study SEIP - Software Engineering in Practice Shuyue Li Xi'an Jiaotong University, Jiaqi Guo Xi'an Jiaotong University, Jian-Guang Lou Microsoft Research, Ming Fan Xi'an Jiaotong University, Ting Liu Xi'an Jiaotong University, Dongmei Zhang Microsoft Research DOI Pre-print Media Attached
05:15 5m Talk		GIFdroid: Automated Replay of Visual Bug Reports for Android Apps Technical Track Sidong Feng Monash University, Chunyang Chen Monash University DOI Pre-print Media Attached
05:20 5m Talk		BuildSheriff: Change-Aware Test Failure Triage for Continuous Integration Builds Technical Track Chen Zhang Fudan University, Bihuan Chen Fudan University, China, Xin Peng Fudan University, Wenyun Zhao Fudan University, China Pre-print Media Attached
05:25 5m Talk		Natural Attack for Pre-trained Models of Code Technical Track Zhou Yang Singapore Management University, Jieke Shi Singapore Management University, Junda He Singapore Management University, David Lo Singapore Management University DOI Pre-print Media Attached

22:00 - 23:00	Software Testing 6SEIP - Software Engineering in Practice / Technical Track / Journal-First Papers at ICSE room 5-even hours Chair(s): Leonardo Sousa

22:00 5m Talk		Algorithmic Profiling for Real-World Complexity Problems Journal-First Papers Boqin Qin China Telecom Cloud Computing Corporation, Tengfei Tu Beijing University of Posts and Telecommunications, Ziheng Liu University of California, San Diego, Tingting Yu University of Cincinnati, Linhai Song Pennsylvania State University, USA DOI Pre-print Media Attached
22:05 5m Talk		To What Extent Do DNN-based Image Classification Models Make Unreliable Inferences? Journal-First Papers Yongqiang Tian The Hong Kong University of Science and Technology; University of Waterloo, Shiqing Ma Rutgers University, Ming Wen Huazhong University of Science and Technology, Yepang Liu Southern University of Science and Technology, Shing-Chi Cheung Hong Kong University of Science and Technology, Xiangyu Zhang Purdue University DOI Pre-print Media Attached
22:10 5m Talk		Testing Machine Learning Systems in Industry: An Empirical Study SEIP - Software Engineering in Practice Shuyue Li Xi'an Jiaotong University, Jiaqi Guo Xi'an Jiaotong University, Jian-Guang Lou Microsoft Research, Ming Fan Xi'an Jiaotong University, Ting Liu Xi'an Jiaotong University, Dongmei Zhang Microsoft Research DOI Pre-print Media Attached
22:15 5m Talk		R2Z2: Detecting Rendering Regressions in Web Browsers through Differential Fuzz Testing Technical Track Suhwan Song Seoul National University, South Korea, Jaewon Hur Seoul National University, Sunwoo Kim Samsung Research, Samsung Electronics, Philip Rogers Google, Byoungyoung Lee Seoul National University, South Korea Pre-print Media Attached
22:20 5m Talk		Fuzzing Class Specifications Technical Track Facundo Molina University of Rio Cuarto and CONICET, Argentina, Marcelo d'Amorim Federal University of Pernambuco, Nazareno Aguirre University of Rio Cuarto and CONICET, Argentina Pre-print Media Attached
22:25 5m Talk		GIFdroid: Automated Replay of Visual Bug Reports for Android Apps Technical Track Sidong Feng Monash University, Chunyang Chen Monash University DOI Pre-print Media Attached

Information for Participants

Tue 10 May 2022 05:00 - 06:00 at ICSE room 3-odd hours - Software Testing 2 Chair(s): Aldeida Aleti

Info for room ICSE room 3-odd hours:

Click here to go to the room on Midspace

Tue 10 May 2022 22:00 - 23:00 at ICSE room 5-even hours - Software Testing 6 Chair(s): Leonardo Sousa

Info for room ICSE room 5-even hours:

Click here to go to the room on Midspace