An Empirical Study of Flaky Tests in Python (ICST 2023 - Previous Editions)

Who

Martin Gruber, Stephan Lukasczyk, Florian Kroiß, Gordon Fraser

Track

ICST 2023 Previous Editions

Time Zone

The program is currently displayed in (GMT+01:00) Dublin.

Use conference time zone: (GMT+01:00) DublinSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 19 Apr 2023 14:20 - 14:40 at Pearse suite - Session 15: Flaky Tests Chair(s): John Micco

Abstract

Tests that cause spurious failures without any code changes, i.e., flaky tests, hamper regression testing, increase maintenance costs, may shadow real bugs, and decrease trust in tests. While the prevalence and importance of flakiness is well established, prior research focused on Java projects, thus raising the question of how the findings generalize. In order to provide a better understanding of the role of flakiness in software development beyond Java, we empirically study the prevalence, causes, and degree of flakiness within software written in Python, one of the currently most popular programming languages. For this, we sampled 22 352 open source projects from the popular PyPI package index, and analyzed their 876 186 test cases for flakiness. Our investigation suggests that flakiness is equally prevalent in Python as it is in Java. The reasons, however, are different: Order dependency is a much more dominant problem in Python, causing 59 % of the 7 571 flaky tests in our dataset. Another 28 % were caused by test infrastructure problems, which represent a previously undocumented cause of flakiness. The remaining 13 % can mostly be attributed to the use of network and randomness APIs by the projects, which is indicative of the type of software commonly written in Python. Our data also suggests that finding flaky tests requires more runs than are often done in the literature: A 95 % confidence that a passing test case is not flaky on average would require 170 reruns.

DOI

https://doi.org/10.1109/ICST49551.2021.00026

Martin Gruber

BMW Group, University of Passau

Germany

Stephan Lukasczyk

University of Passau

Germany

Florian Kroiß

Gordon Fraser

University of Passau

Germany

Time Zone

The program is currently displayed in (GMT+01:00) Dublin.

Use conference time zone: (GMT+01:00) DublinSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 19 Apr
Displayed time zone: Dublin change

14:00 - 15:40	Session 15: Flaky TestsPrevious Editions / Research Papers at Pearse suite Chair(s): John Micco VMware

14:00 20m Talk		Evaluating Features for Machine Learning Detection of Order- and Non-Order-Dependent Flaky Tests Previous Editions Owain Parry The University of Sheffield, Gregory Kapfhammer Allegheny College, Michael Hilton Carnegie Mellon University, Phil McMinn University of Sheffield DOI
14:20 20m Talk		An Empirical Study of Flaky Tests in Python Previous Editions Martin Gruber BMW Group, University of Passau, Stephan Lukasczyk University of Passau, Florian Kroiß , Gordon Fraser University of Passau DOI
14:40 20m Talk		A Survey on How Test Flakiness Affects Developers and What Support They Need To Address It Previous Editions Martin Gruber BMW Group, University of Passau, Gordon Fraser University of Passau DOI
15:00 20m Talk		Practical Flaky Test Prediction using Common Code Evolution and Test History Data Research Papers Martin Gruber BMW Group, University of Passau, Michael Heine BMW Group; Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Programming Systems Group, Norbert Oster Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Programming Systems Group, Michael Philippsen Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Programming Systems Group, Gordon Fraser University of Passau Pre-print
15:20 20m Talk		A Qualitative Study on the Sources, Impacts, and Mitigation Strategies of Flaky Tests Previous Editions Sarra Habchi Ubisoft, Guillaume Haben University of Luxembourg, Mike Papadakis University of Luxembourg, Luxembourg, Maxime Cordy University of Luxembourg, Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg DOI