TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback
Distinguished Paper Award
Automating Infrastructure-as-Code (IaC) is challenging, and large language models (LLMs) often produce incorrect configurations from natural language. We present TerraFormer, a neuro-symbolic framework for IaC generation and mutation that combines supervised fine-tuning with verifier-guided reinforcement learning, using formal tools to provide feedback on syntax, deployability, and policy compliance. We curate two large, high-quality datasets, TF-Gen (152k instances) and TF-Mutn (52k instances), via multi-stage verification and iterative LLM self-correction. Evaluations against 17 state-of-the-art LLMs, including 50 times larger models like Sonnet 3.7, DeepSeek-R1, and GPT-4.1, show that TerraFormer improves correctness over its base LLM by 15.94% on IaC-Eval, 11.65% on TF-Gen (Test), and 19.60% on TF-Mutn (Test). It outperforms larger models on both TF-Gen (Test) and TF-Mutn (Test), ranks third on IaC-Eval, and achieves top best-practices and security compliance.
Thu 16 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | AI for Software Engineering 12Research Track / SE In Practice (SEIP) at Europa II Chair(s): Peter Rigby Concordia University; Meta | ||
11:00 15mTalk | Dually Hierarchical Drift Adaptation for Online Configuration Performance Learning Research Track Zezhen Xiang University of Electronic Science and Technology of China, Jingzhi Gong King's College London, Tao Chen University of Birmingham Pre-print | ||
11:15 15mTalk | 3D Software Synthesis Driven by Constraint-Expressive Intermediate Representation Research Track Shuqing Li The Chinese University of Hong Kong, Anson Y. Lam The Chinese University of Hong Kong, Yun Peng The Chinese University of Hong Kong, Wenxuan Wang Hong Kong University of Science and Technology, Michael Lyu The Chinese University of Hong Kong Pre-print | ||
11:30 15mTalk | PromiseTune: Unveiling Causally Promising and Explainable Configuration Tuning Research Track Pengzhou Chen University of electronic science and technology of China, Tao Chen University of Birmingham Pre-print | ||
11:45 15mTalk | From Seed to Scope: Reasoning to Identify Change Impact Sets Research Track Pre-print | ||
12:00 15mTalk | TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback SE In Practice (SEIP) Prithwish Jana Georgia Institute of Technology, Sam Davidson Amazon Web Services, Bhavana Bhasker Amazon Web Services, Andrey Kan Amazon Web Services, Anoop Deoras Amazon Web Services, Laurent Callot AWS AI Labs DOI Pre-print Media Attached | ||
12:15 15mTalk | From Code Changes to Quality Gains: An Empirical Study in Python ML Systems with PyQu Research Track Mohamed Almukhtar University of Michigan-Flint, Anwar Ghammam University of Michigan - Dearborn, Marouane Kessentini Grand Valley State University, Hua Ming University of Michigan - Flint Pre-print | ||