FSE 2025
Mon 23 - Fri 27 June 2025 Trondheim, Norway

Due to advancements in graph neural networks, graph learning-based fault localization (GBFL) methods have achieved promising results. However, as these methods are supervised learning paradigms and deep learning is typically data-hungry, they can only be trained on fully labeled large-scale datasets. This is impractical because labeling failed tests is similar to manual fault localization, which is time-consuming and labor-intensive, leading to only a small portion of failed tests that can be labeled within limited budgets. These data labeling limitations would lead to the sub-optimal effectiveness of supervised GBFL techniques. Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model’s performance and address data labeling limitations. However, as these methods are not specifically designed for fault localization, directly utilizing them might lead to sub-optimal effectiveness. In response, we propose a novel semi-supervised GBFL framework, Legato. Legato first leverages the attention mechanism to identify and augment likely fault-unrelated sub-graphs in unlabeled graphs and then quantifies the suspiciousness distribution of unlabeled graphs to estimate pseudo-labels. Through training the model on augmented unlabeled graphs and pseudo-labels, Legato can utilize the unlabeled data to improve the effectiveness of fault localization and address the restrictions in data labeling. Through extensive evaluations against 3 baselines SSL methods, Legato demonstrates superior performance by outperforming all the methods in comparison.