Write a Blog >>
ICSE 2023
Sun 14 - Sat 20 May 2023 Melbourne, Australia
Fri 19 May 2023 16:30 - 16:45 at Level G - Plenary Room 1 - Software quality Chair(s): Valentina Lenarduzzi

In recent years, deep learning (DL) has been increasingly adopted in many application areas. To help deep learning developers better train and test their models, enterprises have built dedicated, multi-tenant platforms equipped with a mass of computing devices like GPUs. The service quality of these platforms plays a critical role in system efficiency and user experience. Nevertheless, there indeed exist diverse types of quality issues that not only waste computing resources significantly but also slow down development productivity severely. In this paper, we present a comprehensive empirical study on quality issues of Platform-X in Microsoft. Platform-X is an internal production deep learning platform that serves hundreds of developers and researchers. We have manually examined 360 real issues and investigated their common symptoms, root causes, and mitigation actions. Our major findings include: (1) 28.33% of the quality issues are caused by hardware (the GPU, network, and compute node) faults; (2) 28.33% of them result from system-side faults (e.g., system defects and service outages); (3) User-side faults (e.g., user bugs and policy violation) account for more than two-fifths (43.34%) of all the common causes; (4) More than three-fifths of all the quality issues can be mitigated by simply resubmitting jobs (34.72%) and improving user code (24.72%). Our study results provide valuable guidance on promoting the service quality of deep learning platforms from both the development and maintenance aspects. The results further motivate possible research directions and tooling support.

Fri 19 May

Displayed time zone: Hobart change

15:45 - 17:15
15:45
15m
Talk
DuetCS: Code Style Transfer through Generation and Retrieval
Technical Track
Binger Chen Technische Universität Berlin, Ziawasch Abedjan Leibniz Universität Hannover
16:00
15m
Talk
Understanding Why and Predicting When Developers Adhere to Code-Quality Standards
SEIP - Software Engineering in Practice
Manish Motwani Georgia Institute of Technology, Yuriy Brun University of Massachusetts
Pre-print
16:15
15m
Talk
Code Compliance Assessment as a Learning Problem
SEIP - Software Engineering in Practice
16:30
15m
Talk
An Empirical Study on Quality Issues of Deep Learning Platform
SEIP - Software Engineering in Practice
Yanjie Gao Microsoft Research, Xiaoxiang Shi , Haoxiang Lin Microsoft Research, Hongyu Zhang The University of Newcastle, Hao Wu , Rui Li , Mao Yang Microsoft Research
Pre-print
16:45
7m
Talk
Can static analysis tools find more defects? A qualitative study of design rule violations found by code review
Journal-First Papers
Sahar Mehrpour George Mason University, USA, Thomas LaToza George Mason University
16:52
7m
Talk
DebtFree: minimizing labeling cost in self-admitted technical debt identification using semi-supervised learning
Journal-First Papers
Huy Tu North Carolina State University, USA, Tim Menzies North Carolina State University
Link to publication Pre-print
17:00
7m
Talk
FIXME: synchronize with database! An empirical study of data access self-admitted technical debt
Journal-First Papers
Biruk Asmare Muse Polytechnique Montréal, Csaba Nagy Software Institute - USI, Lugano, Anthony Cleve University of Namur, Foutse Khomh Polytechnique Montréal, Giuliano Antoniol Polytechnique Montréal
17:07
7m
Talk
How does quality deviate in stable releases by backporting?
NIER - New Ideas and Emerging Results
Jarin Tasnim University of Saskatchewan, Debasish Chakroborti University of Saskatchewan, Chanchal K. Roy University of Saskatchewan, Kevin Schneider University of Saskatchewan
Link to publication Pre-print