How Useful is Code Change Information for Fault Localization in Continuous Integration?
Continuous integration (CI) is a software practice by which developers frequently merge and test code under development. In CI settings, the change information is finer-grained. Prior studies have widely studied and evaluated the performance of spectrum-based fault localization (SBFL) techniques. While the continuous nature of CI requires the code changes to be atomic and presents fine-grained information on what part of the system is being changed, traditional SBFL techniques do not benefit from it. In this paper, we conduct an empirical study on the effectiveness of using and integrating code and coverage changes for fault localization in CI settings. We conduct our study on seven open source systems, with a total of 192 faults. We find that while both change information covers a reduced search space compared to code coverage, the percentages of faulty methods in the search space are 7 and 14 times higher than code coverage for code changes and coverage changes, respectively. Then, we propose three change-based fault localization techniques and compare them with Ochiai, a commonly used SBFL technique. Our results show that all three change-based techniques outperform Ochiai, achieving an improvement that varies from 7% to 23% and 17% to 24% over Ochiai for average MAP and MRR, respectively. Moreover, we find that our change-based fault localization techniques can be integrated with Ochiai, achieving up to 53% and 52% improvement over Ochiai in average MAP and MRR respectively, and locating 41 more faults at Top-1.