Root Causing Flaky Tests in a Large-scale Industrial Setting (ISSTA 2019 - Technical Papers)

Who

Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, Suresh Thummalapenta

Track

ISSTA 2019 Technical Papers

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 18 Jul 2019 11:00 - 11:22 at Grand Ballroom - Regression Testing Chair(s): Dan Hao

Abstract

In today’s agile world, developers are expected to deliver features rapidly to meet the demands of customers. To ensure that new changes do not introduce regressions, developers often rely on continuous integration pipelines to help build and validate their changes via executing tests in an efficient manner. One of the significant factors that hinder developers’ productivity is flaky tests—tests that may fail and pass with the same version of code. Since flaky test failures are not deterministically reproducible, developers often have to spend hours only to discover that the occasional failures have nothing to do with their changes. However, ignoring failures of flaky tests can be dangerous, since those failures may represent real faults in the production code. Furthermore, identifying the root cause of flakiness is tedious and cumbersome, since they are often a consequence of unexpected and non-deterministic behavior due to various factors, such as concurrency and external dependencies.

As developers from a large-scale industrial setting, we first describe our experience with flaky tests by conducting a study on them. Our quantitative results show that although the number of flaky tests may be low, the percentage of failing builds due to flaky tests can be substantial. To reduce the burden of flaky tests on developers, we describe our end-to-end framework that helps identify flaky tests and understand their root causes. Our framework instruments flaky tests and all relevant code to log various runtime properties, and then uses a preliminary tool, called RootFinder, to find differences in the logs of passing and failing executions. Using our framework, we collect and publicize a dataset of real-world, anonymized execution logs of flaky tests. We hope that by sharing the findings from our study, the framework, and a dataset of logs, we will encourage more research on this important problem.

Wing Lam

University of Illinois at Urbana-Champaign

United States

Patrice Godefroid

Microsoft Research

United States

Suman Nath

Microsoft Corporation

Anirudh Santhiar

Indian Institute of Science

Suresh Thummalapenta