Write a Blog >>
Wed 12 Oct 2022 17:10 - 17:30 at Gold A - Technical Session 20 - Web, Cloud, Networking Chair(s): Karine Even-Mendoza

With the ever increasing scale and complexity of online systems, incidents are gradually becoming commonplace. Without appropriate handling, they can seriously harm the system availability. However, in large-scale online systems, these incidents are usually drowning in a slew of issues (i.e., something abnormal, while not necessarily an incident), rendering them difficult to handle. Typically, these issues will result in a cascading effect across the system, and a proper management of the incidents depends heavily on a thorough analysis of this effect. Therefore, in this paper, we propose a method to automatically analyze the cascading effect of availability issues in online systems and extract the corresponding graph based issue representations incorporating both of the issue symptoms and affected service attributes. With the extracted representations, we train and utilize a graph neural networks based model to perform incident detection. Then, for the detected incident, we leverage the PageRank algorithm with a flexible transition matrix design to locate its root cause. We evaluate our approach using real-world data collected from a very large instant messaging company. The results confirm the effectiveness of our approach. Moreover, our approach is successfully deployed in the company and eases the burden of operators in the face of a flood of issues and related alert signals.

Wed 12 Oct

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 18:00
Technical Session 20 - Web, Cloud, NetworkingJournal-first Papers / Late Breaking Results / Research Papers / Tool Demonstrations / Industry Showcase at Gold A
Chair(s): Karine Even-Mendoza Imperial College London
16:00
20m
Paper
Mutation-based Analysis of Queueing Network Performance Models -- Journal First Research
Journal-first Papers
Thomas Laurent Lero & University College Dublin, Paolo Arcaini National Institute of Informatics , Catia Trubiani Gran Sasso Science Institute, Anthony Ventresque University College Dublin & Lero, Ireland
Link to publication DOI
16:20
10m
Demonstration
WebMonitor: https://youtu.be/hqVw0JU3k9c
Tool Demonstrations
Ennio Visconti TU Wien, Christos Tsigkanos University of Bern, Switzerland, Laura Nenzi University of Trieste
16:30
20m
Research paper
Exploiting Epochs and Symmetries in Analysing MPI Programs
Research Papers
Rishabh Ranjan IIT Delhi, Ishita Agrawal IIT Delhi, Subodh Sharma IIT Delhi
16:50
20m
Paper
MLASP: Machine learning assisted capacity planning
Journal-first Papers
Arthur Vitui Concordia University, Tse-Hsun (Peter) Chen Concordia University
Link to publication DOI
17:10
20m
Research paper
Graph based Incident Extraction and Diagnosis in Large-Scale Online SystemsVirtual
Research Papers
Zilong He Sun Yat-Sen University, Pengfei Chen Sun Yat-Sen University, Yu Luo Tencent Inc., Qiuyu Yan Tencent Inc., Hongyang Chen School of Computer Science and Engineering, Sun Yat-sen University, Guangba  Yu Sun Yat-Sen University, Fangyuan Li Tencent Inc.
17:30
10m
Paper
ESAVE: Estimating Server and Virtual Machine EnergyVirtual
Late Breaking Results
Priyavanshi Pathania Accenture Labs, Rohit Mehra Accenture Labs, Vibhu Saujanya Sharma Accenture Labs, Vikrant Kaulgud Accenture Labs, India, Sanjay Podder Accenture, Adam P. Burden Accenture
17:40
20m
Industry talk
MCDA Framework for Edge-Aware Multi-Cloud Hybrid Architecture RecommendationVirtual
Industry Showcase
Manish Ahuja Accenture Labs, Narendranath Sukhavasi Accenture Labs, Swapnajeet Choudhury Accenture Labs, Kaushik Amar Das Accenture Labs, Kapil Singi Accenture, Kuntal Dey Accenture Labs, India, Vikrant Kaulgud Accenture Labs, India