Mitigating the Effects of Flaky Tests on Mutation Testing
Mutation testing is widely used in research as a metric for evaluating the quality of test suites. However, traditional mutation testing assumes tests to exhibit deterministic behavior, in terms of their coverage and the outcome of a test (not) killing a certain mutant. Such an assumption does not hold in the presence of flaky tests, whose outcomes can non-deterministically differ even when run on the same code under test. Almost all modern software projects have some flaky tests. Without reliable test outcomes, mutation testing can result in unreliable results, e.g., in our experiments, mutation scores vary by 5 percentage points on average between repeated executions, and the difference in mutant-test pairs was 10 percentage points on average. We propose an advanced technique that better controls for flakiness throughout the mutation testing process. We implement our techniques by modifying the popular open-source tool, PIT. We evaluate our modifications on 30 open-source projects, finding that our technique can increase developers’ confidence in mutation results in the presence of flaky tests by almost entirely eliminating the number of “unknown” (flaky) mutants.
Thu 18 Jul
|11:00 - 11:22|
|11:22 - 11:45|
August ShiUniversity of Illinois at Urbana-Champaign, Jonathan BellGeorge Mason University, Darko MarinovUniversity of Illinois at Urbana-ChampaignPre-print Media Attached
|11:45 - 12:07|
|12:07 - 12:30|