Walk the Talk: Is Your Log-based Software Reliability Maintenance System Really Reliable?
This program is tentative and subject to change.
Log-based software reliability maintenance systems are crucial for sustaining stable customer experience. However, existing deep learning-based methods represent a black box for service providers, making it impossible for providers to understand how these methods detect anomalies, thereby hindering trust and deployment in real production environments. To address this issue, this paper defines a trustworthiness metric—diagnostic faithfulness—for models to gain service providers’ trust, based on surveys of SREs at a major cloud provider. We design two evaluation tasks: attention-based root cause localization and event perturbation. Empirical studies demonstrate that existing methods perform poorly in diagnostic faithfulness. Consequently, we propose FaithLog, a faithful log-based anomaly detection system, which achieves faithfulness through a carefully designed causality-guided attention mechanism and adversarial consistency learning. Evaluation results on two public datasets and one industrial dataset demonstrate that the proposed method achieves state-of-the-art performance in diagnostic faithfulness.
This program is tentative and subject to change.
Mon 17 NovDisplayed time zone: Seoul change
16:00 - 16:50 | |||
16:00 10mTalk | LogPilot: Intent-aware and Scalable Alert Diagnosis for Large-scale Online Service Systems Industry Showcase Zhihan Jiang The Chinese University of Hong Kong, Jinyang Liu ByteDance, Yichen LI ByteDance, Haiyu Huang CUHK, Xiao He Bytedance, Tieying Zhang ByteDance, Jianjun Chen Bytedance, Yi Li Nanyang Technological University, Rui Shi Bytedance, Michael Lyu The Chinese University of Hong Kong | ||
16:10 10mTalk | Walk the Talk: Is Your Log-based Software Reliability Maintenance System Really Reliable? NIER Track Minghua He Peking University, Tong Jia Institute for Artificial Intelligence, Peking University, Beijing, China, Chiming Duan Peking University, Pei Xiao Peking University, Lingzhe Zhang Peking University, China, Kangjin Wang Alibaba Group, Yifan Wu Peking University, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Gang Huang Peking University | ||
16:20 10mTalk | Automated Proactive Logging Quality Improvement for Large-Scale Codebases Industry Showcase Yichen LI ByteDance, Jinyang Liu ByteDance, Junsong Pu School of Software Engineering, Sun Yat-sen University, Zhihan Jiang The Chinese University of Hong Kong, Zhuangbin Chen Sun Yat-sen University, Xiao He Bytedance, Tieying Zhang ByteDance, Jianjun Chen Bytedance, Yi Li Nanyang Technological University, Rui Shi Bytedance, Michael Lyu The Chinese University of Hong Kong | ||
16:30 10mTalk | LogSage: An LLM-Based Framework for CI/CD Failure Detection and Remediation with Industrial Validation Industry Showcase Juntao Luo ByteDance, Weiyuan Xu East China Normal University, ByteDance, Tao Huang ByteDance, Kaixin Sui ByteDance, Jie Geng ByteDance, Qijun Ma ByteDance, Isami Akasaka ByteDance, Xiaoxue Shi ByteDance, Jing Tang ByteDance, Peng Cai East China Normal University) | ||
16:40 10mTalk | From Technical Excellence to Practical Adoption: Lessons Learned Building an ML-Enhanced Trace Analysis Tool Industry Showcase Kaveh Shahedi Polytechnique Montréal, Matthew Khouzam Ericsson AB, Heng Li Polytechnique Montréal, Maxime Lamothe Polytechnique Montreal, Foutse Khomh Polytechnique Montréal | ||