FSE 2026
Sun 5 - Thu 9 July 2026 Montreal, Canada

This program is tentative and subject to change.

Wed 8 Jul 2026 10:30 - 10:50 at MB 2.435 - Cloud

Large-scale cloud systems underpin modern computing, hosting diverse components to deliver critical services worldwide. A single fault—such as an outage or misconfiguration—can simultaneously disrupt thousands of users. Such large-scale faults, referred to as batch failures, are characterized by many affected instances across the same subject within a short time window, typically stemming from a shared root cause. Handling these failures efficiently requires anomaly localization, but existing approaches offer insufficient support to engineers, making the process time-consuming and cognitively demanding. To address this, we propose \textbf{Aloha}, a human-in-the-loop agent framework for anomaly localization based on contrast analysis. Aloha operationalizes the entire batch failure handling pipeline, providing scenario- and data-aware guidance along with interpretable root-cause patterns for engineers. Pilots on real-world batch failure cases in Microsoft’s cloud show that Aloha streamlines data handling, supports contrast-based anomaly localization, and makes the process more practical and accessible, offering a promising step toward human-centered, scalable failure management in large-scale cloud systems.

This program is tentative and subject to change.

Wed 8 Jul

Displayed time zone: Eastern Time (US & Canada) change

10:30 - 12:30
10:30
20m
Talk
Aloha: Localizing Batch Failures in Large-scale Cloud Systems via Contrast Analysis and Human-in-the-Loop Agent
Industry Papers
Shenglin Zhang Nankai University, Yujia Wu Nankai University, Jinghuan Ren Nankai University, College of Software, Yongqian Sun Nankai University, Wenwei Gu Nankai University, Chaoyun Zhang Microsoft, Liqun Li Microsoft Research, Qingwei Lin Microsoft, Dongmei Zhang Microsoft, Saravanakumar Rajmohan Microsoft 365, Chetan Bansal Microsoft Research, Minghua Ma Microsoft
10:50
20m
Talk
Attention Enhanced Entity Recommendation for Intelligent Monitoring in Cloud Systems
Industry Papers
Fiza Husain Independent, Anson Bastos Microsoft, Anjaly Parayil Microsoft, Ayush Choure Independent, Chetan Bansal Microsoft Research, Rujia Wang Microsoft, Saravanakumar Rajmohan Microsoft 365
11:10
20m
Talk
An Agentic Framework for Triaging Incidents in Production Cloud Infrastructure
Industry Papers
Yuhan Yao Microsoft, Yuxuan Jiang University of Michigan Ann-Arbor, Minghua Ma Microsoft, Madhura Vaidya Microsoft, Jieren Deng Microsoft, Yigong Hu Boston University, Chetan Bansal Microsoft Research, Ze Li Microsoft Azure, Murali Chintalapati Microsoft Azure
11:30
20m
Talk
TSGuard: Automated User-Centric Incident Diagnosis for AI Workloads in the Cloud
Research Papers
Yitao Yang The Chinese University of Hong Kong, Yangtao Deng The Chinese University of Hong Kong, Yifan Xiong Microsoft Research, Baochun Li University of Toronto, Hong Xu The Chinese University of Hong Kong, Peng Cheng Microsoft Research Asia
11:50
20m
Talk
Exploring the impact of cloud computing on software architecture for sustainability: A practitioners' perspective
Journal-First Paper
Sahar Ahmadisakha University of Groningen, Vasilios Andrikopoulos University of Groningen
12:10
20m
Talk
AccessRefinery: Fast Mining Concise Access Control Intents on Public Cloud
Research Papers
Ning Kang Xi'an Jiaotong University, Peng Zhang Xi'an Jiaotong University, Jianyuan Zhang Xi'an Jiaotong University, Hao Li Xi'an Jiaotong University, Dan Wang Xi'an Jiaotong University, Zhenrong Gu Xi'an Jiaotong University, Weibo Lin Huawei Cloud, Shibiao Jiang Huawei Cloud, Zhu He Huawei Cloud, Xu Du Huawei Cloud, Longfei Chen Huawei Cloud, Jun Li Huawei, Xiaohong Guan Xi'an Jiaotong University