CAIN 2025
Sun 27 - Mon 28 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025
Mon 28 Apr 2025 10:15 - 10:18 at 208 - Lightning talks Chair(s): Scott Barnett

Large Language Models (LLMs) are powerful tools used in several applications such as conversational AI, and code generation. However, significant robustness concerns arise with LLMs in production, such as hallucinations, prompt injection attacks, harmful content generation, and challenges in maintaining accurate domain-specific content moderation. Guardrails aim to mitigate these challenges by aligning LLM outputs with desired behaviors without modifying the underlying models. Nvidia NeMo Guardrails, for instance, rely on specifying acceptable/unacceptable behaviours. However, it is challenging to predict and address potential issues of LLMs in advance to create these guardrails. Also, manual updates from software engineers are often required to maintain and refine these guardrails. We introduce LLM-Guards, specialised machine learning (ML) models trained to function as protective guards. Additionally, we present an automation pipeline for training and continual fine-tuning of these guards using reinforcement learning from human feedback (RLHF). We evaluated several small LLMs, including Llama-3, Mistral, and Gemma, as LLM-Guards for challenges such as moderation and detecting off-topic queries, and compared their performance against NeMo Guardrails. The proposed Llama-3 LLM-Guard outperformed NeMo Guardrails in detecting off-topic queries, achieving an accuracy of 98.7% compared to 81%. Furthermore, the LLM-Guard detected 97.86% of harmful queries, surpassing NeMo Guardrails by 19.86%.

Mon 28 Apr

Displayed time zone: Eastern Time (US & Canada) change

10:00 - 10:30
Lightning talksPosters at 208
Chair(s): Scott Barnett Deakin University, Australia
10:00
3m
Poster
All You Need is an AI Platform: A Proposal for a Complete Reference Architecture
Posters
Benjamin Weigell University of Augsburg, Fabian Stieler University of Augsburg, Bernhard Bauer University of Augsburg
10:03
3m
Poster
Evaluating Reinforcement Learning Safety and Trustworthiness in Cyber-Physical Systems
Posters
Katherine R. Dearstyne University of Notre Dame, Pedro Alarcon Granadeno University of Notre Dame, Theodore Chambers University of Notre Dame, Jane Cleland-Huang University of Notre Dame
10:06
3m
Poster
Finding Trojan Triggers in Code LLMs: An Occlusion-based Human-in-the-loop Approach
Posters
Aftab Hussain Texas A&M University, College Station, Rafiqul Rabin UL Research Institutes, Toufique Ahmed IBM Research, Amin Alipour University of Houston, Bowen Xu North Carolina State University, Stephen Huang University of Houston
Pre-print
10:09
3m
Poster
Navigating the Shift: Architectural Transformations and Emerging Verification Demands in AI-Enabled Cyber-Physical Systems
Posters
Hadiza Yusuf University of Michigan - Dearborn, Khouloud Gaaloul University of Michigan - Dearborn
10:12
3m
Poster
Random Perturbation Attacks on LLMs for Code Generation
Posters
Qiulu Peng Carnegie Mellon University, Chi Zhang , Ravi Mangal Colorado State University, Corina S. Păsăreanu Carnegie Mellon University; NASA Ames, Limin Jia Carnegie Mellon University
10:15
3m
Poster
Safeguarding LLM-Applications: Specify or Train?
Posters
Hala Abdelkader Applied Artificial Intelligence Institute, Deakin University, Mohamed Abdelrazek Deakin University, Australia, Sankhya Singh Deakin University, Irini Logothetis Applied Artificial Intelligence Institute, Deakin University, Priya Rani RMIT University, Rajesh Vasa Deakin University, Australia, Jean-Guy Schneider Monash University
10:18
3m
Poster
Task decomposition and RAG as Design Patterns for LLM-based Systems
Posters
:
:
:
: