Automated Bug Discovery in Cloud Infrastructure-as-Code Updates with LLM Agents
Cloud environments are increasingly managed by Infrastructure-as-Code (IaC) platforms (e.g., Terraform), which allow developers to define their desired infrastructure as a configuration program that describes cloud resources and their dependencies. This shields developers from low-level operations for creating and maintaining resources, since they are automatically performed by IaC platforms when compiling and deploying the configuration. However, while IaC platforms are rigorously tested for initial deployments, they exhibit myriad errors for runtime updates, e.g., adding/removing resources and dependencies. IaC updates are common because cloud infrastructures are long-lived but user requirements fluctuate over time. Unfortunately, our experience shows that updates often introduce subtle yet impactful bugs. The update logic in IaC frameworks is hard to test due to the vast and evolving search space, which includes diverse infrastructure setups and a wide range of provided resources with new ones frequently added. We introduce TerraFault, an automated, efficient, LLM-guided system for discovering update bugs, and report our findings with an initial prototype. TerraFault incorporates various optimizations to navigate the large search space efficiently and employs techniques to accelerate the testing process. Our prototype has successfully identified bugs even in simple IaC updates, showing early promise in systematically identifying update bugs in today’s IaC frameworks to increase their reliability.
Sat 3 MayDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:30 | |||
11:00 30mTalk | Invited Talk2: Scaling Intelligence: AIOps for Complex Systems AIOps Andriy Miranskyy Toronto Metropolitan University (formerly Ryerson University) | ||
11:30 20mTalk | Towards Using LLMs for Distributed Trace Comparison (Abstract) AIOps Vaastav Anand , Pedro Las-Casas Microsoft, Rodrigo Fonseca Microsoft Research, Antoine Kaufmann MPI-SWS | ||
11:50 20mTalk | Automated Bug Discovery in Cloud Infrastructure-as-Code Updates with LLM Agents AIOps Yiming Xiang University of Michigan, Zhenning Yang University of Michigan, Jingjia Peng University of Michigan, Hermann Bauer University of Michigan, Patrick Tser Jern Kon University of Michigan, Yiming Qiu University of Michigan, Ang Chen University of Michigan | ||
12:10 20mTalk | Breaking the Cycle of Recurring Failures: Applying Generative AI to Root Cause Analysis in Legacy Banking Systems AIOps Siyuan Jin Hong Kong University of Science and Technology, Zhendong Bei HSBC, Bichao Chen HSBC, Yong Xia HSBC |