Towards enhancing the reproducibility of deep learning bugs: an empirical study (FSE 2025 - Journal First)

Mon 23 - Fri 27 June 2025 Trondheim, Norway

co-located with ISSTA 2025

Who

Mehil Shah, Masud Rahman, Foutse Khomh

Track

FSE 2025 Journal First

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 24 Jun 2025 15:00 - 15:20 at Cosmos 3A - Empirical Studies 1 Chair(s): Letizia Jaccheri

Abstract

Context: Deep learning has achieved remarkable progress in various domains. However, like any software system, deep learning systems contain bugs, some of which can have severe impacts, as evidenced by crashes involving autonomous vehicles. Despite substantial advancements in deep learning techniques, little research has focused on reproducing deep learning bugs, which is an essential step for their resolution. Existing literature suggests that only 3% of deep learning bugs are reproducible, underscoring the need for further research.

Objective: This paper examines the reproducibility of deep learning bugs. We identify edit actions and useful information that could improve the reproducibility of deep learning bugs.

Method: First, we construct a dataset of 668 deep learning bugs from Stack Overflow and GitHub across three frameworks and 22 architectures. Second, out of the 668 bugs, we select 165 bugs using stratified sampling and attempt to determine their reproducibility. While reproducing these bugs, we identify edit actions and useful information for their reproduction. Third, we used the Apriori algorithm to identify useful information and edit actions required to reproduce specific types of bugs. Finally, we conduct a user study involving 22 developers to assess the effectiveness of our findings in real-life settings.

Results: We successfully reproduced 148 out of 165 bugs attempted. We identified ten edit actions and five useful types of component information that can help us reproduce the deep learning bugs. With the help of our findings, the developers were able to reproduce 22.92% more bugs and reduce their reproduction time by 24.35%.

Conclusions: Our research addresses the critical issue of deep learning bug reproducibility. Practitioners and researchers can leverage our findings to improve deep learning bug reproducibility.

Link to Publication

https://link.springer.com/article/10.1007/s10664-024-10579-w

Link to Preprint

https://arxiv.org/abs/2401.03069

Mehil Shah

Dalhousie University

Canada

Masud Rahman

Dalhousie University

Canada

Foutse Khomh

Polytechnique Montréal

Canada

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 24 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:20	Empirical Studies 1Research Papers / Journal First at Cosmos 3A Chair(s): Letizia Jaccheri Norwegian University of Science and Technology (NTNU)

14:00 20m Talk		Core Developer Turnover in the Rust Package Ecosystem: Prevalence, Impact, and Awareness Research Papers Meng Fan Beijing Institute of Technology, Yuxia Zhang Beijing Institute of Technology, Klaas-Jan Stol Lero; University College Cork; SINTEF Digital , Hui Liu Beijing Institute of Technology DOI
14:20 20m Talk		A Comprehensive Study of Governance Issues in Decentralized Finance Applications Journal First Wei Ma Singapore Management University, Chenguang Zhu Meta AI, Ye Liu Singapore Management University, Xiaofei Xie Singapore Management University, Yi Li Nanyang Technological University Link to publication Pre-print
14:40 20m Talk		An Empirical Study on Release-Wise Refactoring Patterns Research Papers Shayan Noei Queen's University, Heng Li Polytechnique Montréal, Ying Zou Queen's University, Kingston, Ontario DOI
15:00 20m Talk		Towards enhancing the reproducibility of deep learning bugs: an empirical study Journal First Mehil Shah Dalhousie University, Masud Rahman Dalhousie University, Foutse Khomh Polytechnique Montréal Link to publication Pre-print

Information for Participants

Tue 24 Jun 2025 14:00 - 15:20 at Cosmos 3A - Empirical Studies 1 Chair(s): Letizia Jaccheri

Info for room Cosmos 3A:

Cosmos 3A is the first room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.