Comparative analysis of real bugs in open-source Machine Learning projects - A Registered Report (ESEIW 2022 - ESEM Registered Reports)

Who

Tuan Dung Lai, Anj Simmons, Scott Barnett, Jean-Guy Schneider, Rajesh Vasa

Track

ESEIW 2022 ESEM Registered Reports

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 22 Sep 2022 16:49 - 17:00 at Sonck - Session 3B - Registered Reports 1 Chair(s): Sérgio Soares

Abstract

Background: Machine Learning (ML) systems rely on data to make predictions, the systems have many added components compared to traditional software systems such as the data processing pipeline, serving pipeline, and model training. Existing research on software maintenance has studied the issue-reporting needs and resolution process for different types of issues, such as performance and security issues. However, ML systems have specific classes of faults, and reporting ML issues requires domain-specific information. Because of the different characteristics between ML and traditional Software Engineering systems, we do not know to what extent the reporting needs are different, and to what extent these differences impact the issue resolution process.

Objective: Our objective is to investigate whether there is a discrepancy in the distribution of resolution time between ML and non-ML issues and whether certain categories of ML issues require a longer time to resolve based on real issue reports in open-source applied ML projects. We further investigate the size of fix of ML issues and non-ML issues.

Method: We extract issues reports, pull requests, and code files in recent active applied ML projects from Github and use an automatic approach to filter ML and non-ML issues. We manually label the issues using a known taxonomy of deep learning bugs. We measure the resolution time and size of fix of ML and non-ML issues on a controlled sample and compare the distributions for each category of issue.

Link to Publication

https://arxiv.org/pdf/2209.09932.pdf

Link to Preprint

https://arxiv.org/abs/2209.09932

Tuan Dung Lai

Deakin University

Anj Simmons

Deakin University

Scott Barnett

Deakin University

Jean-Guy Schneider

Deakin University

Australia

Rajesh Vasa

Deakin University, Australia

Australia

Time Zone

The program is currently displayed in (GMT+03:00) Athens.

Use conference time zone: (GMT+03:00) AthensSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 22 Sep
Displayed time zone: Athens change

15:45 - 17:00	Session 3B - Registered Reports 1ESEM Registered Reports / ESEIW ESEM at Sonck Chair(s): Sérgio Soares Universidade Federal de Pernambuco

15:45 10m		The Relevance of Model Transformation Language Features on Qualitative Properties of MTLs: A Study Protocol ESEM Registered Reports Stefan Höppner Ulm University, Matthias Tichy Ulm University, Germany DOI
15:55 10m		On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools ESEM Registered Reports Aurora Papotti Vrije Universiteit Amsterdam, Ranindya Paramitha University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam DOI Pre-print
16:06 10m		Does Road Diversity Really Matter in Testing Automated Driving Systems? A Registered Report ESEM Registered Reports Stefan Klikovits , Vincenzo Riccio USI Lugano, Ezequiel Castellano National Institute of Informatics, Ahmet Cetinkaya Shibaura Institute of Technology, Alessio Gambi IMC University of Applied Sciences Krems, Paolo Arcaini National Institute of Informatics Link to publication
16:17 10m		A Unified and Holistic Classification Scheme for Software Engineering Research ESEM Registered Reports Angelika Kaplan Karlsruhe Institute of Technology, Thomas Kühn Karlsruhe Institute of Technology, Ralf Reussner Karlsruhe Institute of Technology (KIT) and FZI - Research Center for Information Technology (FZI)
16:27 10m		Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP ESEM Registered Reports Benjamin Ledel TU Clausthal, Steffen Herbold TU Clausthal Pre-print
16:38 10m		Team performance and large-scale agile software development ESEM Registered Reports Muhammad Ovais Ahmad Karlstad University, Hadi Ghanbari Aalto University, Tomas Gustavsson Karlstad University
16:49 10m Research paper		Comparative analysis of real bugs in open-source Machine Learning projects - A Registered Report ESEM Registered Reports Tuan Dung Lai Deakin University, Anj Simmons Deakin University, Scott Barnett Deakin University, Jean-Guy Schneider Deakin University, Rajesh Vasa Deakin University, Australia Link to publication Pre-print