Non-deterministic test behavior, or flakiness, is common and dreaded among developers. Researchers have studied the issue and pro- posed approaches to mitigate it. However, the vast majority of previous work has only considered developer-written tests. The prevalence and nature of flaky tests produced by test generation tools remains largely unknown. We ask whether such tools also produce flaky tests and how these differ from developer-written ones. Furthermore, we evaluate mechanisms that suppress flaky test generation. We sample 6 356 projects written in Java or Python. For each project, we generate tests using EvoSuite (Java) and Pynguin (Python), and execute each test 200 times, looking for inconsistent outcomes. Our results show that flakiness is at least as common in generated tests as in developer-written tests. Nevertheless, exist- ing flakiness suppression mechanisms are effective in alleviating this issue (71.7 % fewer flaky tests). Compared to developer-written flaky tests, the causes of generated flaky tests are distributed differ- ently. Their non-deterministic behavior is more frequently caused by randomness, rather than by networking and concurrency. Using flakiness suppression, the remaining flaky tests differ significantly from any flakiness previously reported, where most are attributable to runtime optimizations and EvoSuite-internal resource thresholds. These insights, with the accompanying dataset, can help maintain- ers to improve test generation tools, give recommendations for developers using these tools, and serve as a foundation for future research in test flakiness or test generation.
Thu 18 AprDisplayed time zone: Lisbon change
11:00 - 12:30 | Testing 3Research Track / Journal-first Papers / Software Engineering in Practice at Grande Auditório Chair(s): José Miguel Rojas The University of Sheffield | ||
11:00 15mTalk | Do Automatic Test Generation Tools Generate Flaky Tests? Research Track Martin Gruber BMW Group, University of Passau, Muhammad Firhard Roslan University of Sheffield, Owain Parry The University of Sheffield, Fabian Scharnböck University of Passau, Phil McMinn University of Sheffield, Gordon Fraser University of Passau Pre-print | ||
11:15 15mTalk | Deep Combination of CDCL(T) and Local Search for Satisfiability Modulo Non-Linear Integer Arithmetic Theory Research Track Xindi Zhang Institute of Software Chinese Academy of Science, Bohan Li Institute of Software Chinese Academy of Science, Shaowei Cai Institute of Software at Chinese Academy of Sciences | ||
11:30 15mTalk | Uncover the Premeditated Attacks: Detecting Exploitable Reentrancy Vulnerabilities by Identifying Attacker Contracts Research Track Shuo Yang Sun Yat-sen University, Jiachi Chen Sun Yat-sen University, Mingyuan Huang Sun Yat-Sen University, Zibin Zheng Sun Yat-sen University, Yuan Huang School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China | ||
11:45 15mTalk | Practical Non-Intrusive GUI Exploration Testing with Visual-based Robotic Arms Research Track Shengcheng Yu Nanjing University, Chunrong Fang Nanjing University, Mingzhe Du Nanjing University, Yuchen Ling Nanjing University, Zhenyu Chen Nanjing University, Zhendong Su ETH Zurich | ||
12:00 15mTalk | Dynamic Inference of Likely Symbolic Tensor Shapes in Python Machine Learning Programs Software Engineering in Practice Pre-print | ||
12:15 7mTalk | Mutation Analysis for Evaluating Code Translation Journal-first Papers Giovani Guizzo Brick Abode, Jie M. Zhang King's College London, Federica Sarro University College London, Mark Harman Meta Platforms, Inc. and UCL, Christoph Treude Singapore Management University | ||
12:22 7mTalk | Generalized Coverage Criteria for Combinatorial Sequence Testing Journal-first Papers Achiya Elyasaf Ben-Gurion University of the Negev, Eitan Farchi IBM Haifa Research Lab, Oded Margalit Ben-Gurion University of the Negev, Gera Weiss Ben-Gurion University of the Negev, Yeshayahu Weiss Ben-Gurion University of the Negev Link to publication DOI |