Testing Machine Learning Systems in Industry: An Empirical Study
Tue 10 May 2022 22:10 - 22:15 at ICSE room 5-even hours - Software Testing 6 Chair(s): Leonardo Sousa
Machine learning (ML) becomes increasingly prevalent, being integrated into a wide range of software systems. These systems, named ML systems, must be adequately tested to gain confidence that they behave correctly. Although many research efforts have been devoted to testing technologies for ML systems, the industrial teams are faced with some new challenges of testing the ML systems in real-world settings. To absorb inspirations from the industry on the problems in ML testing, we conducted an empirical study including a survey with 87 responses and interviews with 7 senior practitioners working on different ML systems from well-known IT companies. Our study uncovers significant industrial concerns on major testing activities, i.e., test data collection, test execution, and test result analysis, and also some good practices and open challenges from the perspective of the industry. \textbf{(1) Test data collection} is conducted in different ways on ML model, data, and code and faced with different challenges. \textbf{(2) Test execution} in ML systems suffers from two major problems: entanglement among the components and the regression on model performance. \textbf{(3) Test result analysis} centers on quantitative methods, e.g., metric-based evaluation, and is also combined with some qualitative methods based on practitioners’ experience. Based on our findings, we highlight the research opportunities and also provide some implications for practitioners.
Tue 10 MayDisplayed time zone: Eastern Time (US & Canada) change
22:00 - 23:00 | Software Testing 6SEIP - Software Engineering in Practice / Technical Track / Journal-First Papers at ICSE room 5-even hours Chair(s): Leonardo Sousa | ||
22:00 5mTalk | Algorithmic Profiling for Real-World Complexity Problems Journal-First Papers Boqin Qin China Telecom Cloud Computing Corporation, Tengfei Tu Beijing University of Posts and Telecommunications, Ziheng Liu University of California, San Diego, Tingting Yu University of Cincinnati, Linhai Song Pennsylvania State University, USA DOI Pre-print Media Attached | ||
22:05 5mTalk | To What Extent Do DNN-based Image Classification Models Make Unreliable Inferences? Journal-First Papers Yongqiang TIAN The Hong Kong University of Science and Technology; University of Waterloo, Shiqing Ma Rutgers University, Ming Wen Huazhong University of Science and Technology, Yepang Liu Southern University of Science and Technology, Shing-Chi Cheung Hong Kong University of Science and Technology, Xiangyu Zhang Purdue University DOI Pre-print Media Attached | ||
22:10 5mTalk | Testing Machine Learning Systems in Industry: An Empirical Study SEIP - Software Engineering in Practice Shuyue Li Xi'an Jiaotong University, Jiaqi Guo Xi'an Jiaotong University, Jian-Guang Lou Microsoft Research, Ming Fan Xi'an Jiaotong University, Ting Liu Xi'an Jiaotong University, Dongmei Zhang Microsoft Research DOI Pre-print Media Attached | ||
22:15 5mTalk | R2Z2: Detecting Rendering Regressions in Web Browsers through Differential Fuzz Testing Technical Track Suhwan Song Seoul National University, South Korea, Jaewon Hur Seoul National University, Sunwoo Kim Samsung Research, Samsung Electronics, Philip Rogers Google, Byoungyoung Lee Seoul National University, South Korea Pre-print Media Attached | ||
22:20 5mTalk | Fuzzing Class Specifications Technical Track Facundo Molina University of Rio Cuarto and CONICET, Argentina, Marcelo d'Amorim Federal University of Pernambuco, Nazareno Aguirre University of Rio Cuarto and CONICET, Argentina Pre-print Media Attached | ||
22:25 5mTalk | GIFdroid: Automated Replay of Visual Bug Reports for Android Apps Technical Track DOI Pre-print Media Attached |