MicroDiag: Fine-grained Performance Diagnosis for Microservice Systems
Microservice architecture has emerged as a popular pattern for developing large-scale applications for its benefits of flexibility, scalability, and agility. However, the large number of services and complex dependencies make it difficult and time[1]consuming to diagnose performance issues. We propose Micro[1]Diag, an automated system to localize root causes of performance issues in microservice systems at a fine granularity, including not only locating the faulty component but also discovering detailed information for its abnormality. MicroDiag constructs a component dependency graph and performs causal inference on diverse anomaly symptoms to derive a metrics causality graph, which is used to infer root causes. Our experimental evaluation on a microservice benchmark running in a Kubernetes cluster shows that MicroDiag localizes root causes well, with 97% precision of the top 3 most likely root causes, outperforming state-of-the-art methods by at least 31.1%.
Sat 29 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
11:55 - 12:55 | Technical paper session #1CloudIntelligence 2021 at CloudIntelligence Room Chair(s): Qingwei Lin Microsoft Research, Beijing, China | ||
11:55 15mPaper | PerfEstimator: A Generic and Extensible Performance Estimator for Data Parallel DNN Training CloudIntelligence 2021 Chengru Yang University of Science and Technology of China, Zhehao Li University of Science and Technology of China, Chaoyi Ruan University of Science and Technology of China, Guanbin Xu University of Science and Technology of China, Cheng Li University of Science and Technology of China, Ruichuan Chen Nokia Bell Labs, Feng Yan University of Nevada Reno | ||
12:10 15mPaper | Kmon: An In-kernel Transparent Monitoring System for Microservice Systems with eBPF CloudIntelligence 2021 Tianjun Weng Sun Yat-Sen University, Wanqi Yang Sun Yat-Sen University, Guangba Yu Sun Yat-Sen University, Pengfei Chen Sun Yat-Sen University, Jieqi Cui Sun Yat-Sen University, Chuanfu Zhang Sun Yat-Sen University | ||
12:25 15mPaper | TraceLingo: Trace representation and learning for performance issue diagnosis in cloud services CloudIntelligence 2021 Yong Xu Microsoft, China, Yaokang Zhu Microsoft Research Asia, Bo Qiao Microsoft Research, Beijing, China, Hongshu Che Microsoft Research, Beijing, China, Pu Zhao Microsoft Research, Beijing, China, Xu Zhang Microsoft Research, Beijing, China, Ze Li Microsoft, USA, Yingnong Dang Microsoft, USA, Qingwei Lin Microsoft Research, Beijing, China | ||
12:40 15mPaper | MicroDiag: Fine-grained Performance Diagnosis for Microservice Systems CloudIntelligence 2021 Li Wu Elastisys AB/Technische Universität Berlin, Johan Tordsson Elastisys AB, Jasmin Bogatinovski , Erik Elmroth Elastisys AB/Umea University, Odej Kao Technische Universität Berlin |
Go directly to this room on Clowdr