Write a Blog >>
ASE 2021
Mon 15 - Fri 19 November 2021 Australia

This program is tentative and subject to change.

Wed 17 Nov 2021 11:20 - 11:40 at Koala - Large scale systems (any day/band)

As online service systems continue to grow in terms of complexity and volume, how service incidents are managed will greatly impact company revenue and user trust. Due to the cascading effect, cloud failure often comes with an overwhelming number of incidents from dependent services and devices. To pursue an efficient incident management, related incidents should be quickly aggregated to narrow down the problem scope. To this end, in this paper, we propose GRLIA, an incident aggregation framework based on graph representation learning over a graph of cascaded cloud failures. The graph representation is learned for each unique incident in an unsupervised and unified fashion to simultaneously encode the topological and temporal relationship among incidents. Therefore, it can be easily employed for online incident aggregation by measuring their distance. Furthermore, we leverage fine-grained system monitoring data, i.e., Key Performance Indicators (KPIs), to identify the complete scope of failures’ cascading impact. The proposed framework is evaluated with real-world incident data collected from a large-scale online service system of company $\mathcal{H}$. The experimental results demonstrate that GRLIA is effective and outperforms existing methods. Furthermore, our framework has been successfully deployed in industrial practice.

This program is tentative and subject to change.

Wed 17 Nov

Displayed time zone: Hobart change

11:00 - 12:00
Large scale systems (any day/band)Journal-first Papers / Research Papers / Industry Showcase at Koala
11:00
20m
Talk
Groot: An Event-graph-based Approach for Root Cause Analysis in Industrial Settings
Research Papers
Hanzhang Wang eBay, Zhengkai Wu University of Illinois at Urbana-Champaign, Huai Jiang eBay, USA, Yichao Huang eBay, Jiamu Wang eBay, Selcuk Kopru eBay, Tao Xie Peking University
11:20
20m
Talk
Graph-based Incident Aggregation for Large-Scale Online Service Systems
Research Papers
Zhuangbin Chen Chinese University of Hong Kong, China, Yuxin Su The Chinese University of Hong Kong, Jinyang Liu , Hongyu Zhang University of Newcastle, Xuemin Wen Huawei Technologies, Xiao Ling Huawei Technologies, Yongqiang Yang Huawei Technologies, Michael Lyu The Chinese University of Hong Kong
11:40
10m
Talk
Lessons learned from hyper-parameter tuning for microservice candidate identification
Industry Showcase
Rahul Yedida North Carolina State University, Rahul Krishna IBM Research, Anup K. Kalia IBM Research, Tim Menzies North Carolina State University, Jin Xiao IBM Research, Maja Vukovic IBM Research
11:50
10m
Talk
Signal-based properties of cyber-physical systems: Taxonomy and logic-based characterization
Journal-first Papers
Chaima Boufaied University of Luxembourg, Maris Jukss , Domenico Bianculli University of Luxembourg, Lionel Briand University of Luxembourg; University of Ottawa, Yago Isasi Parache LuxSpace