Cost of Flaky Tests in CI: An Industrial Case Study (ICST 2024 - Industry)

Who

Fabian Leinen, Daniel Elsner, Alexander Pretschner, Andreas Stahlbauer , Michael Sailer , Elmar Juergens

Track

ICST 2024 Industry

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 30 May 2024 12:20 - 12:40 at Room 1 - Test Flakiness Chair(s): Andrea Stocco

Abstract

Researchers and practitioners alike increasingly often perceive flaky tests as a major challenge in software engineering. They spend a lot of effort trying to detect, repair, and mitigate the negative effects of flaky tests. However, it is yet unclear where and to what extent the costs of flaky tests manifest in industrial CI development processes. In this study, we compile cost factors introduced by flaky tests in CI development from research and practice and derive a cost model that allows gaining insight into the costs incurred. We then instantiate this model in a case study of a large, commercial software project with ~30 developers and ~1M SLoC. We analyze five years of development history, including CI test logs, commits from the VCS, issue tickets, and tracked work time to quantify the cost factors implied by flaky tests. We find that the time spent dealing with flaky tests in the studied project represents at least 2.5% of the productive developer time. This effort is divided into investigating potentially flaky test failures, which accounts for 1.1% of the total time spent, repairing flaky tests adds another 1.3%, and developing tools to monitor flaky tests adds 0.1%. Contrary to most other studies, we find the cost for rerunning tests to be negligible and inexpensive. Automatically rerunning a test costs 0.02~cents, while not rerunning and thus letting the pipeline fail results in a manual investigation costing $5.67 in our context. The insights gained from our case study have led to the decision to shift effort from investigation and repair to automatically rerunning tests. Our cost model can help practitioners analyze the cost of flaky tests in their context and make informed decisions. Furthermore, our case study provides a first step to better understand the costs of flaky tests, which can lead researchers to industry-relevant problems.

Link to Preprint

https://mediatum.ub.tum.de/doc/1730194/1730194.pdf

Fabian Leinen

Technical University of Munich

Germany

Daniel Elsner

TU Munich

Germany

Alexander Pretschner

TU Munich

Germany

Andreas Stahlbauer

Michael Sailer

Elmar Juergens

CQSE GmbH

Germany

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 30 May
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:40	Test FlakinessJournal-First Papers / Research Papers / Industry at Room 1 Chair(s): Andrea Stocco Technical University of Munich, fortiss

11:00 20m Long-paper		Test Code Flakiness in Mobile Apps: The Developer's Perspective Journal-First Papers Valeria Pontillo Vrije Universiteit Brussel, Fabio Palomba University of Salerno, Filomena Ferrucci University of Salerno Link to publication
11:20 20m Long-paper		Flakiness goes live: Insights from an In Vivo testing simulation study Journal-First Papers Morena Barboni University of Camerino, Antonia Bertolino National Research Council, Italy, Guglielmo De Angelis CNR-IASI
11:40 20m Research paper		262,447 Test Failures Later: An Empirical Evaluation of Flaky Failure Classifiers Research Papers Abdulrahman Alshammari George Mason University, Paul Ammann George Mason University, USA, Michael Hilton Carnegie Mellon University, Jonathan Bell Northeastern University
12:00 20m Research paper		Automatically Reproducing Timing-Dependent Flaky-Test Failures Research Papers Shanto Rahman The University of Texas at Austin, Aaron Massey George Mason University, Wing Lam George Mason University, August Shi The University of Texas at Austin, Jonathan Bell Northeastern University
12:20 20m Industry talk		Cost of Flaky Tests in CI: An Industrial Case Study Industry Fabian Leinen Technical University of Munich, Daniel Elsner TU Munich, Alexander Pretschner TU Munich, Andreas Stahlbauer , Michael Sailer , Elmar Juergens CQSE GmbH Pre-print