ICSE 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada
Sat 3 May 2025 11:50 - 12:10 at 210 - Session2 Chair(s): Jian Zhang

Cloud environments are increasingly managed by Infrastructure-as-Code (IaC) platforms (e.g., Terraform), which allow developers to define their desired infrastructure as a configuration program that describes cloud resources and their dependencies. This shields developers from low-level operations for creating and maintaining resources, since they are automatically performed by IaC platforms when compiling and deploying the configuration. However, while IaC platforms are rigorously tested for initial deployments, they exhibit myriad errors for runtime updates, e.g., adding/removing resources and dependencies. IaC updates are common because cloud infrastructures are long-lived but user requirements fluctuate over time. Unfortunately, our experience shows that updates often introduce subtle yet impactful bugs. The update logic in IaC frameworks is hard to test due to the vast and evolving search space, which includes diverse infrastructure setups and a wide range of provided resources with new ones frequently added. We introduce TerraFault, an automated, efficient, LLM-guided system for discovering update bugs, and report our findings with an initial prototype. TerraFault incorporates various optimizations to navigate the large search space efficiently and employs techniques to accelerate the testing process. Our prototype has successfully identified bugs even in simple IaC updates, showing early promise in systematically identifying update bugs in today’s IaC frameworks to increase their reliability.

Sat 3 May

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30
Session2AIOps at 210
Chair(s): Jian Zhang Microsoft
11:00
30m
Talk
Invited Talk2: Scaling Intelligence: AIOps for Complex Systems
AIOps
Andriy Miranskyy Toronto Metropolitan University (formerly Ryerson University)
11:30
20m
Talk
Towards Using LLMs for Distributed Trace Comparison (Abstract)
AIOps
Vaastav Anand , Pedro Las-Casas Microsoft, Rodrigo Fonseca Microsoft Research, Antoine Kaufmann MPI-SWS
11:50
20m
Talk
Automated Bug Discovery in Cloud Infrastructure-as-Code Updates with LLM Agents
AIOps
Yiming Xiang University of Michigan, Zhenning Yang University of Michigan, Jingjia Peng University of Michigan, Hermann Bauer University of Michigan, Patrick Tser Jern Kon University of Michigan, Yiming Qiu University of Michigan, Ang Chen University of Michigan
12:10
20m
Talk
Breaking the Cycle of Recurring Failures: Applying Generative AI to Root Cause Analysis in Legacy Banking Systems
AIOps
Siyuan Jin Hong Kong University of Science and Technology, Zhendong Bei HSBC, Bichao Chen HSBC, Yong Xia HSBC