SCAM 2025
Sun 7 - Fri 12 September 2025 Auckland, New Zealand
co-located with ICSME 2025
Mon 8 Sep 2025 14:15 - 14:37 at OGGB5 260-051 - LLMs Chair(s): Jens Dietrich

Infrastructure as Code is an emerging paradigm to automate the configuration of cloud infrastructures. Infrastructure code often processes secret information, such as passwords or private keys. Mishandling such secrets can lead to information disclosure vulnerabilities, yet existing efforts to detect them rely on pattern matching of parameter and variable names, causing false positives and negatives due to suboptimal string patterns. This paper aims to address these limitations by assessing the effectiveness of traditional Machine Learning (ML) and transformer-based Language Model (LM) classifiers to predict sensitive module parameters in Ansible, one of the most popular IaC tools. We collect a dataset of over 160,000 Ansible module parameters and their documentation, containing more than 16,000 parameters that expect secret data. Then, we train several ML algorithms and find that the Random Forest algorithm performs best, achieving 93.5% precision but limited recall (72.7%). In parallel, we evaluate multiple pretrained zero-shot language models, which achieve a recall of up to 90.4% at the expense of a lower precision of up to 88.5%. We subsequently fine-tune the language models, resulting in nearly perfect precision (99.8%) and recall (99.8%) on the ground truth dataset. We compare the best performing ML and LM classifiers to two baselines that use string patterns. We find that the ML classifier achieves a performance comparable to the two baselines, while the fine-tuned LM outperforms all approaches. A qualitative comparison reveals that the approaches are complementary to the baselines, motivating future work to use prediction models to reduce false positives in reports generated by inexpensive baselines. However, we also find that the fine-tuned LM misses several secrets caused by noise in the dataset, highlighting the importance of fine-tuning on a high-quality ground truth.

Presentation (SCAM25_coen_2.pdf)5.43MiB

Mon 8 Sep

Displayed time zone: Auckland, Wellington change

13:30 - 15:00
LLMsResearch Track at OGGB5 260-051
Chair(s): Jens Dietrich Victoria University of Wellington
13:30
22m
Research paper
Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification
Research Track
Linh Nguyen The University of Melbourne, Chunhua Liu The University of Melbourne, Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne
Pre-print
13:52
22m
Research paper
Language-Agnostic Generation of Header Comments using Large Language Models
Research Track
Nathanael Yao Queen's University, Juergen Dingel Queen's University, Ali Tizghadam TELUS, Ibrahim Amer Queen's University
14:15
22m
Research paper
Smelling Secrets: Leveraging Machine Learning and Language Models for Sensitive Parameter Detection in Ansible Security Analysis
Research Track
Ruben Opdebeeck Vrije Universiteit Brussel, Valeria Pontillo Gran Sasso Science Institute, Camilo Velázquez-Rodríguez Vrije Universiteit Brussel, Wolfgang De Meuter Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit Brussel
Pre-print File Attached
14:37
22m
Research paper
Testing the Untestable? An Empirical Study on the Testing Process of LLM-Powered Software Systems
Research Track
Cleyton V. C. de Magalhaes CESAR School, Italo Santos University of Hawai‘i at Mānoa, Brody Stuart-Verner University of Calgary, Ronnie de Souza Santos University of Calgary
Pre-print