ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil

In large-scale cloud infrastructure, changes are a primary source of critical service incidents. Rapidly identifying the root-cause change is paramount for minimizing service disruption and ensuring reliability. However, this task is challenging because incident context is often fragmented across diverse, multi-modal data sources, while change data are typically inconsistent and semantically mismatched with fault signals. This challenge is compounded by the trade-off between identification speed and accuracy, and the scarcity of labeled data makes traditional supervised methods impractical.

To address these challenges, we present MagmaScope, a hybrid system that combines a lightweight ranking algorithm with a powerful LLM-based reasoning agent to identify root-cause changes. MagmaScope first employs a rapid, coarse-grained ranking to filter thousands of candidate changes based on temporal, dependency, and lexical signals. It then utilizes a fine-grained ranking module and an agent to perform deep, contextual analysis on the most probable candidates. A key novelty of our approach is the systematic use of multi-modal data from incident response chats to enrich fault context, and a “Change Object Rewrite” technique that leverages an LLM to standardize inconsistent change descriptions for effective correlation. We evaluated MagmaScope on a dataset of 55 real-world emergency incidents from ByteDance’s cloud infrastructure. The results demonstrate that MagmaScope achieves a top@5 accuracy of 74.7% and a top@10 accuracy of 89.8% in identifying the root-cause change, significantly outperforming traditional keyword-search and vector-similarity baselines. MagmaScope has been successfully deployed in production at ByteDance for over a year, proving its effectiveness and reliability in helping on-call engineers accelerate incident resolution in a complex, industrial-scale environment.