A Comprehensive Study of Real-World Bugs in Machine Learning Model Optimization (ICSE 2023 - Technical Track) - ICSE 2023

Write a Blog >>

Sun 14 - Sat 20 May 2023 Melbourne, Australia

Who

Hao Guan, Ying Xiao, Jiaying Li, Yepang Liu, Guangdong Bai

Track

ICSE 2023 Technical Track

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Wed 17 May 2023 12:15 - 12:30 at Meeting Room 102 - Mining software repositories Chair(s): Brittany Johnson

Abstract

Due to the great advance in machine learning (ML) techniques, numerous ML models are expanding their application domains in recent years. To adapt for resource-constrained platforms such as mobile and Internet of Things (IoT) devices, pre-trained models are often processed to enhance their efficiency and compactness, using optimization techniques such as pruning and quantization. Similar to the optimization process in other complex systems, e.g., program compilers and databases, optimizations for ML models can contain bugs, leading to severe consequences such as system crashes and financial loss. While bugs in training, compiling and deployment stages have been extensively studied, there is still a lack of systematic understanding and characterization of model optimization bugs (MOBs).

In this work, we conduct the first empirical study to identify and characterize MOBs. We collect a comprehensive dataset containing 371 MOBs from TensorFlow and PyTorch, the most extensively used open-source ML frameworks, covering the entire development time span of their optimizers (May 2019 to August 2022). We then investigate the collected bugs from various perspectives, including their symptoms, root causes, life cycles, detection and fixes. Our work unveils the status quo of MOBs in the wild, and reveals their features on which future detection techniques can be based. Our findings also serve as a warning to the developers and the users of ML frameworks, and an appeal to our research community to enact dedicated countermeasures.

Hao Guan

The University of Queensland

Ying Xiao

Southern University of Science and Technology

Jiaying Li

Microsoft

Yepang Liu

Southern University of Science and Technology

China

Guangdong Bai

University of Queensland

Australia

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Wed 17 May
Displayed time zone: Hobart change

	11:00 - 12:30	Mining software repositoriesTechnical Track / Journal-First Papers / DEMO - Demonstrations at Meeting Room 102 Chair(s): Brittany Johnson George Mason University

	11:00 15m Talk		The untold story of code refactoring customizations in practice Technical Track Daniel Oliveira PUC-Rio, Wesley Assunção Johannes Kepler University Linz, Austria & Pontifical Catholic University of Rio de Janeiro, Brazil, Alessandro Garcia PUC-Rio, Ana Carla Bibiano PUC-Rio, Márcio Ribeiro Federal University of Alagoas, Brazil, Rohit Gheyi Federal University of Campina Grande, Baldoino Fonseca Federal University of Alagoas (UFAL) Pre-print
	11:15 15m Talk		Data Quality for Software Vulnerability Datasets Technical Track Roland Croft The University of Adelaide, Muhammad Ali Babar University of Adelaide, M. Mehdi Kholoosi University of Adelaide Pre-print
	11:30 15m Talk		Do code refactorings influence the merge effort? Technical Track André Oliveira Federal Fluminense University, Vania Neves Universidade Federal Fluminense (UFF), Alexandre Plastino Federal Fluminense University, Ana Carla Bibiano PUC-Rio, Alessandro Garcia PUC-Rio, Leonardo Murta Universidade Federal Fluminense (UFF)
	11:45 7m Talk		ActionsRemaker: Reproducing GitHub Actions DEMO - Demonstrations Hao-Nan Zhu University of California, Davis, Kevin Guan University of California, Davis, Robert M. Furth University of California, Davis, Cindy Rubio-González University of California at Davis
	11:52 7m Talk		Problems with with SZZ and Features: An empirical assessment of the state of practice of defect prediction data collection Journal-First Papers Steffen Herbold University of Passau, Alexander Trautsch University of Passau, Alexander Trautsch Germany, Benjamin Ledel None
	12:00 7m Talk		An empirical study of issue-link algorithms: which issue-link algorithms should we use? Journal-First Papers Masanari Kondo Kyushu University, Yutaro Kashiwa Nara Institute of Science and Technology, Yasutaka Kamei Kyushu University, Osamu Mizuno Kyoto Institute of Technology
	12:07 7m Talk		SCS-Gan: Learning Functionality-Agnostic Stylometric Representations for Source Code Authorship Verification Journal-First Papers Weihan Ou Queen's University at Kingston, Ding Steven, H., H. Queen’s University at Kingston, Yuan Tian Queens University, Kingston, Canada, Leo Song Queen’s University at Kingston
	12:15 15m Talk		A Comprehensive Study of Real-World Bugs in Machine Learning Model Optimization Technical Track Hao Guan The University of Queensland, Ying Xiao Southern University of Science and Technology, Jiaying Li Microsoft, Yepang Liu Southern University of Science and Technology, Guangdong Bai University of Queensland