ESEIW 2024
Sun 20 - Fri 25 October 2024 Barcelona, Spain

Background: Test flakiness is a major problem in the software industry. Flaky tests fail seemingly at random without changes to the code and thus impede continuous integration (CI). Some researchers argue that all tests can be considered flaky and that tests only differ in their frequency of flaky failures. This position implies that the definition of test flakiness includes failures caused by interruptions in the testing environment.

Aims: With the goal of developing mitigation strategies to reduce the negative impact of test flakiness, we study characteristics of tests and the test environment which potentially impact test flakiness.

Method: We construct two datasets based on SAP HANA’s test results over a 12-week period: one based on production data of the SAP HANA CI pipeline, the other based on targeted test executions from a dedicated flakiness experiment. We conduct correlation analysis for test and test environment characteristics with respect to their influence on the frequency of flaky test failures.

Results: In our study, the average test execution time had the strongest positive correlation with the test flakiness rate (r = 0.79), which confirms previous studies. Potential reasons for higher flakiness include the larger test scope of long-running tests or test executions on a slower test infrastructure. We found that distributed tests had a lower flakiness rate than non-distributed tests. Interestingly, the load on the testing infrastructure was not correlated with test flakiness. The relationship between test flakiness and required resources for test execution (i.e., memory and CPU) is inconclusive.

Conclusions: Based on our findings, we conclude that splitting long-running tests can be an important measure for practitioners to cope with test flakiness. Test splitting enables parallelization of test executions and also reduces the cost of re-executions after flaky failures because the scope of the re-executed tests is narrower. Thus, splitting long-running tests into smaller tests with a narrower scope can effectively decrease the negative effects of test flakiness in complex testing environments. However, when splitting long-running tests, practitioners need to consider the potential test setup overhead of test splits.

Fri 25 Oct

Displayed time zone: Brussels, Copenhagen, Madrid, Paris change

14:00 - 15:30
Empirical studies in various domainsESEM IGC / ESEM Journal-First Papers / ESEM Emerging Results, Vision and Reflection Papers Track at Multimedia (B3 Building - Hall)
Chair(s): Carolyn Seaman University of Maryland Baltimore County
14:00
15m
Industry talk
Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA
ESEM IGC
Alexander Berndt , Thomas Bach SAP, Sebastian Baltes University of Bayreuth
Pre-print
14:15
15m
Industry talk
Preliminary Insights on Industry Practices for Addressing Fairness Debt
ESEM IGC
Ronnie de Souza Santos University of Calgary, Luiz Fernando de Lima , Maria Teresa Baldassarre Department of Computer Science, University of Bari , Rodrigo Spinola Virginia Commonwealth University
Pre-print
14:30
15m
Industry talk
From Struggle to Simplicity with a Usable and Secure API for Encryption in Java
ESEM IGC
Ehsan Firouzi TU Clausthal, Ammar Mansuri TU Clausthal, Mohammad Ghafari TU Clausthal, Maziar Kaveh Amazon AWS
14:45
15m
Journal Early-Feedback
The influence of the city metaphor and its derivates in software visualization
ESEM Journal-First Papers
David Moreno-Lumbreras Universidad Rey Juan Carlos, Jesus M. Gonzalez-Barahona Universidad Rey Juan Carlos, Gregorio Robles Universidad Rey Juan Carlos, Valerio Cosentino Eventbrite
DOI
15:00
15m
Vision and Emerging Results
Code Clone Configuration as a Multi-Objective Search Problem
ESEM Emerging Results, Vision and Reflection Papers Track
Denis Sousa State University of Ceará, Matheus Paixao State University of Ceará, Chaiyong Ragkhitwetsagul Mahidol University, Italo Uchoa State University of Ceará