Predicting Survived and Killed Mutants
Mutation Testing (MT) is a state-of-the-art technique for assessing test suite effectiveness. The MT principle is to inject variants, known as mutants, into the System Under Test (SUT). Then, the behaviour of the original SUT is compared to that of the mutated SUT when running the same test suite. If no difference in behaviour is observed, the mutant is said to have survived; otherwise, it is said to have been killed. Despite its strengths, the applicability of MT in practice has been limited by its high computational cost. To mitigate this problem, Predictive Mutation Testing (PMT) has been proposed. PMT uses a classification model based on features related to the mutated code and the test suite to predict the execution results of a mutant without actually executing it. In other words, PMT predicts whether a mutant will be killed or will survive. In previous studies, PMT has been evaluated on several projects in two application scenarios, involving cross-project and crossversion learning. The goal of our research is to investigate how well the proposed PMT method, which has been evaluated on Java, can be extended to other programming languages. For that purpose, we first replicated the previous study and then extended the PMT approach to a single C program. We used random forrest classifiers as our supervised learning approach of choice. Our results indicate that PMT is able to predict the execution results of mutants with high accuracy. On the Java projects, we achieved Area Under Curve (AUC) values above 0.90 with a Prediction Error (PE) below 10%. On the C project, we achieved an AUC value above 0.90 with a PE below 1%. In our analyses we also investigated how sensitive the performance of PMT is to the set of selected features. In particular, we wanted to understand whether adding programming language specific features to a language independent core set of features significantly improve the performance of PMT. Our results are an indicator that, overall, PMT has potential to be applied across programming languages and is robust when dealing with imbalanced data.