As IT environments evolve in both size and complexity, observability tools are needed to monitor their health. As the anomalous events are detected, alerts are generated, leading to alert notifications to the Site Reliability Engineers(SREs). However, most of these notifications turn out to be false alarms, leading to alert fatigue, and inefficiencies. Existing approaches for reducing alert noise rely on static policies that can quickly become outdated in dynamic IT environments and are therefore difficult to maintain. In this work, we propose a novel unsupervised approach, Dynamic-X-Y, guided by a well known moving average envelope statistical method, to learn custom tailored alert suppression policy from historical alerts and events data. At run-time, these learned policies are applied to incoming events/alerts to reduce false alert notifications. We validate our approach on two different datasets, log anomaly and metric anomaly events/alerts, to show percentage increase in accuracy over state-of-the-art methods by $7.39%$ and $35.7%$, respectively.
Pingchuan Ma HKUST, Zhenlan Ji The Hong Kong University of Science and Technology, Peisen Yao Zhejing University, Shuai Wang The Hong Kong University of Science and Technology, Kui Ren Zhejiang University