The Importance of Accounting for Execution Failures when Predicting Test Flakiness
Flaky tests are tests that pass and fail on different executions of the same version of a program under test. They waste valuable developer time by making developers investigate false alerts (flaky test failures). To deal with this issue, many prediction methods have been proposed. However, the utility of these methods remains unclear since they are typically evaluated based on single release data, ignoring that in many cases tests that fail flakily in one release also correctly fail (indicating the presence of bugs) in some other, meaning that it is possible for subsequent correctly-failing cases to pass unnoticed. In this paper, we show that this situation is prevalent and can raise significant concerns for both researchers and practitioners. In particular, we show that flaky tests, tests that exhibit flaky behaviour at some point in time, have a strong fault- revealing capability, i.e., they reveal more than 1/3 of all encountered regression faults. We also show that 76.2%, of all test executions that reveal faults in the codebase under test are made by tests that are classified as flaky by existing prediction methods. Overall, our findings motivate the need for future research to focus on predicting flaky test executions instead of flaky tests.
Wed 30 OctDisplayed time zone: Pacific Time (US & Canada) change
10:30 - 12:00 | |||
10:30 15mTalk | B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests Research Papers Mouxiang Chen Zhejiang University, Zhongxin Liu Zhejiang University, He Tao Zhejiang University, Yusu Hong Zhejiang University, David Lo Singapore Management University, Xin Xia Huawei, JianLing Sun Zhejiang University | ||
10:45 15mTalk | Reducing Test Runtime by Transforming Test Fixtures Research Papers Chengpeng Li University of Texas at Austin, Abdelrahman Baz The University of Texas at Austin, August Shi The University of Texas at Austin | ||
11:00 15mTalk | Efficient Incremental Code Coverage Analysis for Regression Test Suites Research Papers | ||
11:15 15mTalk | Combining Coverage and Expert Features with Semantic Representation for Coincidental Correctness Detection Research Papers Huan Xie Chongqing University, Yan Lei Chongqing University, Maojin Li Chongqing University, Meng Yan Chongqing University, Sheng Zhang Chongqing University | ||
11:30 15mTalk | A Combinatorial Testing Approach to Surrogate Model Construction Research Papers Sunny Shree The University of Texas at Arlington, Krishna Khadka The University of Texas at Arlington, Jeff Yu Lei University of Texas at Arlington, Raghu Kacker National Institute of Standards and Technology, D. Richard Kuhn National Institute of Standards and Technology | ||
11:45 15mTalk | The Importance of Accounting for Execution Failures when Predicting Test Flakiness Industry Showcase Guillaume Haben University of Luxembourg, Sarra Habchi Ubisoft Montréal, John Micco VMware, Mark Harman Meta Platforms, Inc. and UCL, Mike Papadakis University of Luxembourg, Maxime Cordy University of Luxembourg, Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg Pre-print |