Are Mutation Scores Correlated with Real Fault Detection? A Large Scale Empirical study on the Relationship Between Mutants and Real Faults
Empirical validation of software testing studies is increasingly relying on mutants. This practice is motivated by the strong correlation between mutant scores and real fault detection that is reported in the literature. In contrast, our study shows that correlations are the results of the confounding effects of the test suite size. In particular, we investigate the relation between two independent variables, mutation score and test suite size, with one dependent variable the detection of (real) faults. We use two data sets, CoreBench and Defects4J, with large C and Java programs and real faults and provide evidence that all correlations between mutation scores and real fault detection are weak when controlling for test suite size. We also find that both independent variables significantly influence the dependent one, with significantly better fits, but overall with relative low prediction power. By measuring the fault detection capability of the top ranked, according to mutation score, test suites (opposed to randomly selected test suites of the same size), we find that achieving higher mutation scores improves significantly the fault detection. Taken together, our data suggest that mutants provide good guidance for improving the fault detection of test suites, but their correlation with fault detection are weak.
Thu 31 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
14:00 - 15:30 | Testing IJournal first papers / Technical Papers at Congress Hall Chair(s): Antonia Bertolino CNR-ISTI | ||
14:00 20mTalk | ChangeLocator: Locate Crash-Inducing Changes Based on Crash Reports Journal first papers Rongxin Wu Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Ming Wen The Hong Kong University of Science and Technology, Shing-Chi Cheung Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hongyu Zhang The University of Newcastle | ||
14:20 20mTalk | Are Mutation Scores Correlated with Real Fault Detection? A Large Scale Empirical study on the Relationship Between Mutants and Real Faults Technical Papers Mike Papadakis University of Luxembourg, Donghwan Shin KAIST, Shin Yoo Korea Advanced Institute of Science and Technology, Doo-Hwan Bae Korea Advanced Institute of Science and Technology Pre-print | ||
14:40 20mTalk | Efficient Sampling of SAT Solutions for Testing Technical Papers Rafael Dutra UC Berkeley, Kevin Laeufer University of California, Berkeley, Jonathan Bachrach , Koushik Sen University of California, Berkeley Link to publication DOI Media Attached File Attached | ||
15:00 20mTalk | Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction Journal first papers Pre-print | ||
15:20 10mTalk | Q&A in groups Technical Papers |