Log Anomaly to Resolution: AI Based Proactive Incident Remediation
Based on 2020 SRE report, 80% of SREs work on post-mortem analysis of incidents due to lack of provided information and 16% of toil come from investigating false positives/negatives. As a cloud service provider, the desire is to proactively identify signals that can help reduce outages and/or reduce the mean time to resolution. By leveraging AI for Operations (AIOps), this work proposes a novel methodology for proactive identification of log anomalies and its resolutions by sifting through the log lines. Typically, relevant information to retrieve resolutions corresponding to logs is spread across multiple heterogeneous corpora that exist in silos, namely historical ticket data, historical log data, and symptom resolution available in product documentation, for example. In this paper, we focus on augmented dataset preparation from heterogeneous corpora, metadata selection and prediction, and finally, using these elements during run-time to retrieve contextual resolutions for signals triggered via logs. For early evaluation, we used logs from a production middleware application server, predicted log anomalies and their resolutions, and conducted qualitative evaluation with subject matter experts; the metadata prediction is 78.57% accurate, the retrieval accuracy of resolutions is 65.7%.
Wed 17 NovDisplayed time zone: Hobart change
19:00 - 20:00
|Race Detection for Event-Driven Node.js Applications|
Xiaoning Chang Institute of Software, Chinese Academy of Sciences, Wensheng Dou Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Jun Wei Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Tao Huang Institute of Software Chinese Academy of Sciences, Jinhui Xie Tencent Inc., Yuetang Deng Tencent, Jianbo Yang Tencent Inc., Jiaheng Yang Tencent Inc.
|Log-based Anomaly Detection Without Log Parsing|
Research PapersLink to publication DOI Pre-print
|Log Anomaly to Resolution: AI Based Proactive Incident Remediation|
|HyperGI: Automated Detection and Repair of Information Flow Leakage|
Ibrahim Mesecan Iowa State University, Daniel Blackwell University College London, David Clark University College London, Myra Cohen Iowa State University, Justyna Petke University College LondonPre-print