Designing and Optimizing Alignment Datasets for IoT Security: A Synergistic Approach with Static Analysis Insights
Large Language Models (LLMs) show great promise for automating critical IoT security tasks, yet they often fail to address high-stakes vulnerabilities without domain-focused datasets. In this paper, we present a structured methodology to design and optimize IoT-specific alignment datasets informed by static analysis insights, thereby bridging the gap between generic language models and specialized IoT security requirements. Our approach integrates findings from IoT firmware analysis tools (e.g. FACT and Binwalk) with authoritative vulnerability repositories (MITRE CVE, CWE, CAPEC) to construct three key dataset types: (1) Base Datasets, capturing essential IoT vulnerabilities and configurations, (2) Classification Datasets, discerning IoT from non-IoT prompts, and (3) Alignment Datasets employing Contrastive Preference Optimization (CPO), Direct Preference Optimization (DPO), and Kahneman-Tversky Optimization (KTO) for IoT-specific fine-tuning. We further incorporate secure-by-design principles and bias mitigation strategies—ranging from device-type diversity to synthetic data augmentation—to ensure fair, high-fidelity representations of IoT security scenarios. Experimental results demonstrate that our alignment datasets improve LLM responsiveness and correctness for vulnerabilities discovered via offline static analysis, including outdated libraries, hard-coded credentials, and insecure default services. Notably, Kahneman-Tversky Optimization achieves a 97% alignment accuracy, reflecting the impact of clear binary classifications in high-stakes security tasks. This work underscores the significance of dual-system integration (static analysis plus LLM alignment) for proactive IoT defense. By foregrounding domain-specific vulnerabilities in carefully curated datasets, we enable LLMs to generate more actionable, context-aware security recommendations, thus advancing state-of-the-art IoT protections in both research and industry deployments.
Thu 26 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
16:00 - 18:00 | |||
16:00 15mTalk | Leveraging LLM Enhanced Commit Messages to Improve Machine Learning Based Test Case Prioritization PROMISE 2025 Yara Q Mahmoud Ontario Tech University, Akramul Azim Ontario Tech University, Ramiro Liscano Ontario Tech University, Kevin Smith International Business Machines Corporation (IBM), Yee-Kang Chang International Business Machines Corporation (IBM), Gkerta Seferi International Business Machines Corporation (IBM), Qasim Tauseef International Business Machines Corporation (IBM) | ||
16:16 14mTalk | Designing and Optimizing Alignment Datasets for IoT Security: A Synergistic Approach with Static Analysis Insights PROMISE 2025 | ||
16:31 14mTalk | Efficient Adaptation of Large Language Models for Smart Contract Vulnerability Detection PROMISE 2025 Fadul Sikder Department of Computer Science and Engineering, The University of Texas at Arlington, Jeff Yu Lei University of Texas at Arlington, Yuede Ji Department of Computer Science and Engineering, The University of Texas at Arlington | ||
16:46 14mTalk | A Combined Approach to Performance Regression Testing Resource Usage Reduction PROMISE 2025 Milad Abdullah Charles University, David Georg Reichelt Lancaster University Leipzig, Leipzig, Germany, Vojtech Horky Charles University, Lubomír Bulej Charles University, Tomas Bures Charles University, Czech Republic, Petr Tuma Charles University | ||
17:01 14mTalk | Security Bug Report Prediction Within and Across Projects: A Comparative Study of BERT and Random Forest PROMISE 2025 Farnaz Soltaniani TU Clausthal, Mohammad Ghafari TU Clausthal, Mohammed Sayagh ETS Montreal, University of Quebec | ||
17:16 9mTalk | Towards Build Optimization Using Digital Twins PROMISE 2025 Henri Aïdasso École de technologie supérieure (ÉTS), Francis Bordeleau École de Technologie Supérieure (ETS), Ali Tizghadam TELUS | ||
17:26 4mDay closing | Closing PROMISE 2025 |