Smelling Secrets: Leveraging Machine Learning and Language Models for Sensitive Parameter Detection in Ansible Security Analysis (SCAM 2025 - Research Track)

Who

Ruben Opdebeeck, Valeria Pontillo, Camilo Velázquez-Rodríguez, Wolfgang De Meuter, Coen De Roover

Track

SCAM 2025 Research Track

Time Zone

The program is currently displayed in (GMT+12:00) Auckland, Wellington.

Use conference time zone: (GMT+12:00) Auckland, WellingtonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 8 Sep 2025 14:15 - 14:37 at OGGB5 260-051 - LLMs Chair(s): Jens Dietrich

Abstract

Infrastructure as Code is an emerging paradigm to automate the configuration of cloud infrastructures. Infrastructure code often processes secret information, such as passwords or private keys. Mishandling such secrets can lead to information disclosure vulnerabilities, yet existing efforts to detect them rely on pattern matching of parameter and variable names, causing false positives and negatives due to suboptimal string patterns. This paper aims to address these limitations by assessing the effectiveness of traditional Machine Learning (ML) and transformer-based Language Model (LM) classifiers to predict sensitive module parameters in Ansible, one of the most popular IaC tools. We collect a dataset of over 160,000 Ansible module parameters and their documentation, containing more than 16,000 parameters that expect secret data. Then, we train several ML algorithms and find that the Random Forest algorithm performs best, achieving 93.5% precision but limited recall (72.7%). In parallel, we evaluate multiple pretrained zero-shot language models, which achieve a recall of up to 90.4% at the expense of a lower precision of up to 88.5%. We subsequently fine-tune the language models, resulting in nearly perfect precision (99.8%) and recall (99.8%) on the ground truth dataset. We compare the best performing ML and LM classifiers to two baselines that use string patterns. We find that the ML classifier achieves a performance comparable to the two baselines, while the fine-tuned LM outperforms all approaches. A qualitative comparison reveals that the approaches are complementary to the baselines, motivating future work to use prediction models to reduce false positives in reports generated by inexpensive baselines. However, we also find that the fine-tuned LM misses several secrets caused by noise in the dataset, highlighting the importance of fine-tuning on a high-quality ground truth.

Link to Preprint

https://soft.vub.ac.be/Publications/2025/vub-tr-soft-25-08.pdf

File attachments

Presentation (SCAM25_coen_2.pdf)	5.43MiB

Ruben Opdebeeck

Vrije Universiteit Brussel

Belgium

Valeria Pontillo

Gran Sasso Science Institute

Italy

Camilo Velázquez-Rodríguez

Vrije Universiteit Brussel

Belgium

Wolfgang De Meuter

Vrije Universiteit Brussel

Belgium

Coen De Roover

Vrije Universiteit Brussel

Belgium

Time Zone

The program is currently displayed in (GMT+12:00) Auckland, Wellington.

Use conference time zone: (GMT+12:00) Auckland, WellingtonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 8 Sep
Displayed time zone: Auckland, Wellington change

13:30 - 15:00	LLMsResearch Track at OGGB5 260-051 Chair(s): Jens Dietrich Victoria University of Wellington

13:30 22m Research paper		Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification Research Track Linh Nguyen The University of Melbourne, Chunhua Liu The University of Melbourne, Hong Yi Lin The University of Melbourne, Patanamon Thongtanunam University of Melbourne Pre-print
13:52 22m Research paper		Language-Agnostic Generation of Header Comments using Large Language Models Research Track Nathanael Yao Queen's University, Juergen Dingel Queen's University, Ali Tizghadam TELUS, Ibrahim Amer Queen's University
14:15 22m Research paper		Smelling Secrets: Leveraging Machine Learning and Language Models for Sensitive Parameter Detection in Ansible Security Analysis Research Track Ruben Opdebeeck Vrije Universiteit Brussel, Valeria Pontillo Gran Sasso Science Institute, Camilo Velázquez-Rodríguez Vrije Universiteit Brussel, Wolfgang De Meuter Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit Brussel Pre-print File Attached
14:37 22m Research paper		Testing the Untestable? An Empirical Study on the Testing Process of LLM-Powered Software Systems Research Track Cleyton V. C. de Magalhaes CESAR School, Italo Santos University of Hawai‘i at Mānoa, Brody Stuart-Verner University of Calgary, Ronnie de Souza Santos University of Calgary Pre-print