How Incidental are the Incidents? Characterizing and Prioritizing Incidents for Large-Scale Online Service SystemsExperience
[experience paper] Although tremendous efforts have been devoted to the quality assurance of online service systems, in reality, these systems still come across many incidents (i.e., unplanned interruptions and outages), which can decrease user satisfaction or cause economic loss. To better understand the characteristics of incidents and improve the incident management process, we perform the first large-scale empirical analysis of incidents collected from 18 real-world online service systems in a multinational technology company M. Surprisingly, we find that although a large number of incidents could occur over a short period of time, many of them actually do not matter, i.e., engineers will not fix them with a high priority after manually diagnosing their root cause. We call these incidents incidental incidents. Our qualitative and quantitative analyses show that incidental incidents are significant in terms of both number and cost. Therefore, it is important to prioritize incidents by identifying incidental incidents in advance to optimize incident management efforts. In particular, we propose an approach, called DeepIP (Deep learning based Incident Prioritization), to prioritizing incidents based on a large amount of historical incident data. More specifically, we design an attention-based CNN (Convolutional Neural Network) model to learn a prediction model to identify incidental incidents. We then prioritize all incidents by ranking the predicted probabilities of incidents being incidental. We evaluate the performance of DeepIP using real-world incident data. The experimental results show that DeepIP effectively prioritizes incidents by identifying incidental incidents and significantly outperforms all the compared approaches. For example, the AUC of DeepIP achieves 0.808, while that of the best compared approach is only 0.624 onaverage. Also, we share our experience and lessons learned from practice.
Wed 23 SepDisplayed time zone: (UTC) Coordinated Universal Time change
00:00 - 01:00 | Incidents and Bug ReportsResearch Papers at Platypus Chair(s): Yepang Liu Southern University of Science and Technology | ||
00:00 20mTalk | How Incidental are the Incidents? Characterizing and Prioritizing Incidents for Large-Scale Online Service SystemsExperience Research Papers Junjie Chen Tianjin University, China, Shu Zhang Microsoft Research, Beijing, Xiaoting He Microsoft, Qingwei Lin Microsoft Research, China, Hongyu Zhang University of Newcastle, Australia, Dan Hao Peking University, China, Yu Kang Microsoft Research, China, Feng Gao Microsoft, China, Zhangwei Xu Microsoft, China, Yingnong Dang Microsoft, USA, Dongmei Zhang Microsoft Research, China | ||
00:20 20mTalk | Stay Professional and Efficient: Automatically Generate Titles for Your Bug Reports Research Papers Songqiang Chen School of Computer Science, Wuhan University, Xiaoyuan Xie School of Computer Science, Wuhan University, China, Bangguo Yin School of Computer Science, Wuhan University, Yuanxiang Ji School of Computer Science, Wuhan University, Lin Chen Nanjing University, Baowen Xu State Key Laboratory for Novel Software Technology, Nanjing University | ||
00:40 20mTalk | Owl Eyes: Spotting UI Display Issues via Visual Understanding Research Papers Zhe Liu Laboratory for Internet Software Technologies, Institute of Software Chinese Academy of Sciences, University of Chinese Academy of Sciences, Chunyang Chen Monash University, Australia, Junjie Wang Institute of Software, Chinese Academy of Sciences, Yuekai Huang Institute of Software, Chinese Academy of Sciences, Jun Hu Institute of Software, Chinese Academy of Sciences, Qing Wang Institute of Software, Chinese Academy of Sciences |