Xpert: Empowering Incident Management with Query Recommendations via Large Language Models (ICSE 2024 - Research Track)

Who

Yuxuan Jiang, Chaoyun Zhang, Shilin He, Zhihao Yang, Minghua Ma, Si Qin, Yu Kang, Yingnong Dang, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang

Track

ICSE 2024 Research Track

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 18 Apr 2024 11:00 - 11:15 at Pequeno Auditório - LLM, NN and other AI technologies 3 Chair(s): Tushar Sharma

Abstract

Large-scale cloud systems play a pivotal role in modern IT infrastructure. However, incidents occurring within these systems can lead to service disruptions and adversely affect user experience. To swiftly resolve such incidents, on-call engineers depend on crafting domain-specific language (DSL) queries to analyze telemetry data. However, writing these queries can be challenging and time-consuming. This paper presents a thorough empirical study on the utilization of queries of XQL, a DSL employed for incident management in a large-scale cloud management system at CompanyX. The findings obtained underscore the importance and viability of XQL queries recommendation to enhance incident management.

Building upon these valuable insights, we introduce Xpert, an end-to-end machine learning framework that automates XQL recommendation process. By leveraging historical incident data and large language models, Xpert generates customized XQL queries tailored to new incidents. Furthermore, Xpert incorporates a novel performance metric called Xcore, enabling a thorough evaluation of query quality from three comprehensive perspectives. We conduct extensive evaluations of Xpert, demonstrating its effectiveness in offline settings. Notably, we deploy Xpert in the real production environment of a large-scale incident management system in CompanyX, validating its efficiency in supporting incident management. To the best of our knowledge, this paper represents the first empirical study of its kind, and Xpert stands as a pioneering XQL recommendation framework designed for incident management.

Yuxuan Jiang

University of Michigan Ann-Arbor

United States

Chaoyun Zhang

Microsoft

China

Shilin He

Microsoft Research

n.n.

Zhihao Yang

Peking University

Minghua Ma

Microsoft Research

United States

Si Qin

Microsoft Research

China

Yu Kang

Microsoft Research

China

Yingnong Dang

Microsoft Azure

United States

Saravan Rajmohan

Microsoft 365

Qingwei Lin

Microsoft

China

Dongmei Zhang

Microsoft Research

China

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 18 Apr
Displayed time zone: Lisbon change

11:00 - 12:30	LLM, NN and other AI technologies 3New Ideas and Emerging Results / Research Track / Software Engineering Education and Training / Software Engineering in Practice at Pequeno Auditório Chair(s): Tushar Sharma Dalhousie University

11:00 15m Talk		Xpert: Empowering Incident Management with Query Recommendations via Large Language Models Research Track Yuxuan Jiang University of Michigan Ann-Arbor, Chaoyun Zhang Microsoft, Shilin He Microsoft Research, Zhihao Yang Peking University, Minghua Ma Microsoft Research, Si Qin Microsoft Research, Yu Kang Microsoft Research, Yingnong Dang Microsoft Azure, Saravan Rajmohan Microsoft 365, Qingwei Lin Microsoft, Dongmei Zhang Microsoft Research
11:15 15m Talk		Tensor-Aware Energy Accounting Research Track Timur Babakol SUNY Binghamton, USA, Yu David Liu SUNY Binghamton DOI Pre-print
11:30 15m Talk		LLM4PLC: Harnessing Large Language Models for Verifiable Programming of PLCs in Industrial Control Systems Software Engineering in Practice Mohamad Fakih University of California, Irvine, Rahul Dharmaji University of California, Irvine, Yasamin Moghaddas University of California, Irvine, Gustavo Quiros Siemens Technology, Tosin Ogundare Siemens Technology, Mohammad Al Faruque UCI
11:45 15m Talk		Resolving Code Review Comments with Machine Learning Software Engineering in Practice Alexander Frömmgen Google, Jacob Austin Google, Peter Choy Google, Nimesh Ghelani Google, Lera Kharatyan Google, Gabriela Surita Google, Elena Khrapko Google, Pascal Lamblin Google, Pierre-Antoine Manzagol Google, Marcus Revaj Google, Maxim Tabachnyk Google, Danny Tarlow Google, Kevin Villela Google, Dan Zheng Google DeepMind, Satish Chandra Google, Inc, Petros Maniatis Google DeepMind
12:00 15m Talk		LLMs Still Can't Avoid Instanceof: An investigation Into GPT-3.5, GPT-4 and Bard's Capacity to Handle Object-Oriented Programming Assignments Software Engineering Education and Training Bruno Pereira Cipriano Lusófona University, COPELABS, Pedro Alves Lusófona University, COPELABS
12:15 7m Talk		Leveraging Large Language Models to Improve REST API Testing New Ideas and Emerging Results Myeongsoo Kim Georgia Institute of Technology, Tyler Stennett Georgia Institute of Technology, Dhruv Shah Georgia Institute of Technology, Saurabh Sinha IBM Research, Alessandro Orso Georgia Institute of Technology Pre-print
12:22 7m Talk		LogExpert: Log-based Recommended Resolutions Generation using Large Language Model New Ideas and Emerging Results JiaboWang Beijing University of Posts and Telecommunications, guojun chu Beijing University of Posts and Telecommunications, Jingyu Wang , Haifeng Sun Beijing University of Posts and Telecommunications, Qi Qi , Yuanyi Wang Beijing University of Posts and Telecommunications, Ji Qi China Mobile (Suzhou) Software Technology Co., Ltd., Jianxin Liao Beijing University of Posts and Telecommunications