Evolving Ranking-Based Failure Proximities for Better Clustering in Fault Isolation
This artifact supplies the replication package and the supplementary material of the paper “Evolving Ranking-Based Failure Proximities for Better Clustering in Fault Isolation”, which has been accepted within the ASE’22 Research Track. In this ASE 2022 research paper, we present a genetic programming-based framework along with a sophisticated fitness function, for evolving risk evaluation formulas with the goal of more properly representing failures in multi-fault scenarios. By using a small set of programs for training, we get a collection of formulas that can obtain good results applicable in a larger and more general scale of scenarios.
We have implemented our approach with Python. To help the readers technically understand our evolution framework, replicate our work, and even configure the hyper-parameters for further use, we make the replication package and the supplementary material. The replication package includes 1) the proposed evolution framework’s code and its usage instruction, 2) the benchmark projects used in our experiments, and 3) the tool or strategy for generating multi-fault programs. And the supplementary material includes 1) the expression and the fitness score of all formulas (in the training), 2) the expression and the fitness score of the dominant formulas (in the test), and 3) the values of k in the scenario of “k ≠ r”.
DOI link: https://doi.org/10.5281/zenodo.7034690
The link to our repository: https://github.com/yisongy/SRR-GP
The abstract of the original paper is as follows: Failures that are not related to a specific fault can reduce the effectiveness of fault localization in multi-fault scenarios. To tackle this challenge, researchers and practitioners typically cluster failures (e.g., failed test cases) into several disjoint groups, with those caused by the same fault grouped together. In such a fault isolation process that requires input in a mathematical form, ranking-based failure proximity (R-proximity) is widely used to model failed test cases. In R-proximity, each of failed test cases is represented as a suspiciousness ranking list of program statements through a fingerprinting function (i.e., a risk evaluation formula, REF). Although many off-the-shelf REFs have been integrated into R-proximity, they were designed for single-fault localization originally. To the best of our knowledge, no REF has been developed to serve as a fingerprinting function of R-proximity in multi-fault scenarios. For better clustering failures in fault isolation, in this paper, we present a genetic programming-based framework along with a sophisticated fitness function, for evolving REFs with the goal of more properly representing failures in multi-fault scenarios. By using a small set of programs for training, we get a collection of REFs that can obtain good results applicable in a larger and more general scale of scenarios. The best one of them outperforms the state-of-the-art by 50.72% and 47.41% in faults number estimation and clustering effectiveness, respectively. Our framework is highly configurable for further use, and the evolved formulas can be directly applied in future failure representation tasks without any retraining.