Root Causing Flaky Tests in a Large-scale Industrial Setting
In today’s agile world, developers are expected to deliver features rapidly to meet the demands of customers. To ensure that new changes do not introduce regressions, developers often rely on continuous integration pipelines to help build and validate their changes via executing tests in an efficient manner. One of the significant factors that hinder developers’ productivity is flaky tests—tests that may fail and pass with the same version of code. Since flaky test failures are not deterministically reproducible, developers often have to spend hours only to discover that the occasional failures have nothing to do with their changes. However, ignoring failures of flaky tests can be dangerous, since those failures may represent real faults in the production code. Furthermore, identifying the root cause of flakiness is tedious and cumbersome, since they are often a consequence of unexpected and non-deterministic behavior due to various factors, such as concurrency and external dependencies.
As developers from a large-scale industrial setting, we first describe our experience with flaky tests by conducting a study on them. Our quantitative results show that although the number of flaky tests may be low, the percentage of failing builds due to flaky tests can be substantial. To reduce the burden of flaky tests on developers, we describe our end-to-end framework that helps identify flaky tests and understand their root causes. Our framework instruments flaky tests and all relevant code to log various runtime properties, and then uses a preliminary tool, called RootFinder, to find differences in the logs of passing and failing executions. Using our framework, we collect and publicize a dataset of real-world, anonymized execution logs of flaky tests. We hope that by sharing the findings from our study, the framework, and a dataset of logs, we will encourage more research on this important problem.
Thu 18 JulDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
11:00 - 12:30
|Root Causing Flaky Tests in a Large-scale Industrial Setting
|Mitigating the Effects of Flaky Tests on Mutation Testing
August Shi University of Illinois at Urbana-Champaign, Jonathan Bell George Mason University, Darko Marinov University of Illinois at Urbana-ChampaignPre-print Media Attached
|Assessing the State and Improving the Art of Parallel Testing for C
|Failure Clustering Without Coverage