Learning Dependencies in Distributed Cloud Applications to Identify and Localize Anomalies
Operation and maintenance of large distributed cloud applications can quickly become unmanageably complex, putting human operators under immense stress when problems occur. Utilizing machine learning for identification and local ization of anomalies in such systems supports human experts and enables fast mitigation. However, due to the various inter dependencies of system components, anomalies do not only affect their origin but propagate through the distributed system. Taking this into account, we present Arvalus and its variant D-Arvalus, a neural graph transformation method that models system components as nodes and their dependencies and placement as edges to improve the identification and localization of anomalies. Given a series of metric KPIs, our method predicts the most likely system state - either normal or an anomaly class - and performs localization when an anomaly is detected. During our experiments, we simulate a distributed cloud application deployment and synthetically inject anomalies. The evaluation shows the generally good prediction performance of Arvalus and reveals the advantage of D-Arvalus which incorporates information about system component dependencies.
Sat 29 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
14:15 - 15:00 | Technical Paper Session #2CloudIntelligence 2021 at CloudIntelligence Room Chair(s): Qingwei Lin Microsoft Research, Beijing, China | ||
14:15 15mPaper | Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models CloudIntelligence 2021 Jasmin Bogatinovski , Harald Ott TU Berlin, Alexander Acker , Sasho Nedelkoski TU Berlin, Odej Kao Technische Universität Berlin | ||
14:30 15mPaper | Rapid Trend Prediction for Large-Scale Cloud Database KPIs by Clustering CloudIntelligence 2021 Xiaoling Wang Northwestern Polytechnical University, Ning Li School of Computer Science, Northwestern Polytechnical University, Lijun Zhang Northwestern Polytechnical University, Xiaofang Zhang Northwestern Polytechnical University, Qiong Zhao Bank of Communications | ||
14:45 15mPaper | Learning Dependencies in Distributed Cloud Applications to Identify and Localize Anomalies CloudIntelligence 2021 Dominik Scheinert Technische Universität Berlin, Alexander Acker , Lauritz Thamsen TU Berlin, Morgan Geldenhuys Technische Universität Berlin, Odej Kao Technische Universität Berlin |
Go directly to this room on Clowdr