Unfulfilled Promises: LLM-Based Detection of OS Compatibility Issues in Infrastructure as Code
This program is tentative and subject to change.
Modern infrastructures rely on Infrastructure as Code (IaC) systems to keep complex deployments consistent, reproducible, and scalable at production scale. The reliability of these infrastructures, however, depends on the correctness of their building blocks, which are reusable components (modules) that each performs a dedicated task, such as installing a package, managing an OS user, or configuring a service, and reconciling its state with the desired specification. A central promise of these components is portability: a specification written once should correctly manage the targeted resource on every OS the IaC component supports. When this property is violated, defects can propagate across entire infrastructures, causing outages, security vulnerabilities, and costly misconfigurations.
In this work, we introduce crOSsible, the first automated framework for cross-OS testing of IaC modules. crOSsible leverages large language models (LLMs) to synthesize and repair integration tests from structured module documentation, and executes them across 13 versions of 8 major Linux distributions. While our techniques are generally applicable to different IaC systems, we instantiate and evaluate them on Ansible, the most widely used IaC framework for managing individual servers. Evaluation across 259 popular Ansible modules demonstrates both effectiveness and real-world impact. In just 12 hours of testing, crOSsible uncovered 36 previously unknown bugs, including 22 portability violations. In total, 27 issues have been confirmed by maintainers, with 11 already fixed. The discovered issues range from crashes to dangerous soundness defects where modules reported success despite leaving systems misconfigured. Beyond bug discovery, crOSsible improved the code coverage of Ansible modules by 12.3% on average, systematically exercising OS-specific code paths that existing tests missed.
This program is tentative and subject to change.
Wed 8 JulDisplayed time zone: Eastern Time (US & Canada) change
14:00 - 15:30 | |||
14:00 20mTalk | Detecting Code-Comment Inconsistencies in Smart Contracts by Combining LLM and Program Analysis Research Papers Jiashuo Zhang Peking University, China, Jiachi Chen Sun Yat-sen University, Ting Zhang Peking University, Yue Li Peking University, Daoyuan Wu Lingnan University, Yanlin Wang Sun Yat-sen University, Jianbo Gao Peking University, Ting Chen University of Electronic Science and Technology of China, Zhong Chen | ||
14:20 20mTalk | Unfulfilled Promises: LLM-Based Detection of OS Compatibility Issues in Infrastructure as Code Research Papers Georgios-Petros Drosos ETH Zurich, Georgios Alexopoulos University of Athens, Thodoris Sotiropoulos ETH Zurich, Dimitris Mitropoulos University of Athens, Zhendong Su ETH Zurich | ||
14:40 10mTalk | DePro: Understanding the Role of LLMs in Debugging Competitive Programming Code Ideas, Visions and Reflections Nabiha Parvez Military Institute of Science And Technology, Tanvin Pallab Military Institute of Science And Technology, Mia Mohammad Imran Missouri University of Science and Technology, Tarannum Shaila Zaman University of Maryland Baltimore County | ||
14:50 20mTalk | A Large-Scale Empirical Evaluation of LLMs for Automated Self-Admitted Technical Debt Repayment Journal-First Paper Mohammad Sadegh Sheikhaei Queen's University, Yuan Tian Queen's University, Kingston, Ontario, Shaowei Wang University of Manitoba, Bowen Xu North Carolina State University | ||
15:10 20mTalk | On the synchronization between Hugging Face pre-trained language models and their upstream GitHub repository Journal-First Paper Adekunle Ajibode Queen's University, Abdul Ali Bangash Lahore University of Management Sciences, Oussama Ben Sghaier Queen's University, Bram Adams Queen's University, Ahmed E. Hassan Queen’s University | ||