A major threat to distributed software systems’ reliability is vicious cycles, which are observed when an event in the distributed software system’s execution causes a system degradation, and the degradation, in turn, causes more of such events. Vicious cycles often result in large-scale cloud outages that are hard to recover from due to their self-reinforcing nature.
This paper formally defines Vicious Cycle, and conducts the first in-depth study of 33 real-world vicious cycles in 13 widely-used open-source distributed software systems, shedding light on the root causes, triggering conditions, and fixing strategies of vicious cycles, with over a dozen concrete implications to combat them. Our findings show that the majority of the vicious cycles are caused by incorrect error handlers, where the handlers do not obtain enough information to distinguish between 1) an error induced by incoming requests and 2) an error induced by an unexpected interference from another error handler.
This paper further performs a feasibility study by 1) building a monitoring tool that prevents one type of vicious cycle by collecting information to make a more informed decision in error handling, and 2) investigating the effectiveness of one commonly suggested practice – injecting exponential backoff – to prevent vicious cycles induced by unconstrained retry.
Tue 12 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
10:30 - 12:00 | |||
10:30 12mTalk | Twin Graph-based Anomaly Detection via Attentive Multi-Modal Learning for Microservice System Research Papers Jun Huang Anhui University of Technology, Yang Yang Anhui University of Technology, Hang Yu Ant Group, Jianguo Li Ant Group, Xiao Zheng Anhui University of Technology | ||
10:42 12mTalk | Dynamic Graph Neural Networks-based Alert Link Prediction for Online Service Systems Research Papers Yiru Chen Fudan University, Chenxi Zhang Fudan University, Zhen Dong Fudan University, China, Dingyu Yang Alibaba Group, Xin Peng Fudan University, Jiayu Ou Alibaba Group, Hong Yang Fudan University, Zheshun Wu Alibaba Group, Xiaojun Qu Alibaba Group, Wei Li Alibaba Group | ||
10:54 12mTalk | A Model-based Mode-Switching-Framework based on Security Vulnerability Scores Journal-first Papers Michael Riegler Johannes Kepler University Linz, Johannes Sametinger Johannes Kepler University Linz, Michael Vierhauser University of Innsbruck, Manuel Wimmer JKU Linz Link to publication DOI File Attached | ||
11:06 12mTalk | Maat: Performance Metric Anomaly Anticipation for Cloud Services with Conditional Diffusion Research Papers Cheryl Lee The Chinese University of Hong Kong, Tianyi Yang The Chinese University of Hong Kong, Zhuangbin Chen School of Software Engineering, Sun Yat-sen University, Yuxin Su Sun Yat-sen University, Michael Lyu The Chinese University of Hong Kong Pre-print | ||
11:18 12mTalk | Vicious Cycles in Distributed Software SystemsRecorded talk Research Papers Shangshu Qian Purdue University, Wen Fan Purdue University, Lin Tan Purdue University, Yongle Zhang Purdue University Pre-print Media Attached | ||
11:30 12mTalk | Scene-Driven Exploration and GUI Modeling for Android AppsRecorded talk Research Papers Xiangyu Zhang , Lingling Fan Nankai University, Sen Chen Tianjin University, Yucheng Su Alibaba Group, Boyuan Li Nankai University Media Attached |