Mutation testing is a popular way of assessing and improving test suites by creating defective variants of the code under test (CUT) us- ing mutation operators. A test suite is run against all such defective CUT variants and its adequacy is measured in terms of the number of defective variants it is able to detect. Traditionally, mutation operators target individual statements of the CUT and presume the defect detection ability of mutant-revealing tests on the basis of the coupling effect hypothesis: Tests that reveal all simple faults are sensitive enough to also detect complex faults [8]. This hypothesis has been repeatedly investigated by comparing typical mutations against real defects or more complex (synthetic) mutations.
In this paper we propose to use mutation operators that are de- fined on the basis of empirically derived and confirmed software fault distributions. As these operators are more complex than tradi- tional operators, the corresponding patterns have fewer matches in the CUT. This results in fewer mutants to test with, thereby lower- ing the effort for mutation testing. We comparatively evaluate the utility of the generated mutants against Major, a mutation testing framework with traditional operators, on the Defects4J dataset of Java projects with known defects. The results of our study show that the empirical operators result in fewer mutants to evaluate test suites against, but also that they are easier to detect and less coupled with real defects than traditional mutation testing operators in the majority of cases.
Paper Preprint (main.pdf) | 759KiB |