AdaptiveGuard: Towards Adaptive Runtime Safety for LLM-Powered Software
This program is tentative and subject to change.
Guardrails are critical for the safe deployment of Large Language Models (LLMs)-powered software. Unlike traditional rule-based systems with limited, predefined input-output spaces that inherently constrain unsafe behavior, LLMs enable open-ended, intelligent interactions—opening the door to jailbreak attacks through user inputs. Guardrails serve as a protective layer, filtering unsafe prompts before they reach the LLM. However, prior research shows that jailbreak attacks can still succeed over 70% of the time, even against advanced models like GPT-4o. While guardrails such as LlamaGuard report up to 95% accuracy, our preliminary analysis shows their performance can drop sharply—to as low as 12%—when confronted with unseen attacks. This highlights a growing software engineering challenge: how to build a post-deployment guardrail that adapts dynamically to emerging threats? To address this, we propose AdaptiveGuard, an adaptive guardrail that detects novel jailbreak attacks as out-of-distribution (OOD) inputs and learns to defend against them through a continual learning framework. Through empirical evaluation, AdaptiveGuard achieves 96% OOD detection accuracy, adapts to new attacks in just two update steps, and retains over 85% F1-score on in-distribution data post-adaptation, outperforming other baselines. These results demonstrate that AdaptiveGuard is a guardrail capable of evolving in response to emerging jailbreak strategies post deployment. We release our AdaptiveGuard and studied datasets at https://github.com/awsm-research/AdaptiveGuard to support further research.
This program is tentative and subject to change.
Wed 19 NovDisplayed time zone: Seoul change
16:00 - 17:00 | |||
16:00 10mTalk | The Gold Digger in the Dark Forest: Industrial-Scale MEV Analysis in Ethereum Industry Showcase Ningyu He Hong Kong Polytechnic University, Tianyang Chi Beijing University of Posts and Telecommunications, Xiaohui Hu Huazhong University of Science and Technology, Haoyu Wang Huazhong University of Science and Technology | ||
16:10 10mTalk | RPG: Linux Kernel Fuzzing Guided by Distribution-Specific Runtime Parameter Interfaces Industry Showcase Yuhan Chen Central South Sniversity, Yuheng Shen Tsinghua University, Guoyu Yin Central South University, Fan Ding Central South Sniversity, Runzhe Wang Alibaba Group, Tao Ma Alibaba Group, Xiaohai Shi Alibaba Group, Qiang Fu Central South University, Ying Fu Tsinghua University, Heyuan Shi Central South University | ||
16:20 10mTalk | Securing Self-Managed Third-Party Libraries Industry Showcase Xin Zhou Nanjing University, Jinwei Xu Nanjing University, He Zhang Nanjing University, Yanjing Yang Nanjing University, Lanxin Yang Nanjing University, Bohan Liu Nanjing University, Hongshan Tang JD.com, Inc. | ||
16:30 10mTalk | STaint: Detecting Second-Order Vulnerabilities in PHP Applications with LLM-Assisted Bi-Directional Static Taint Analysis NIER Track Yuchen Ji ShanghaiTech University, Hongchen Cao ShanghaiTech University, Jingzhu He ShanghaiTech University | ||
16:40 10mTalk | AdaptiveGuard: Towards Adaptive Runtime Safety for LLM-Powered Software Industry Showcase Rui Yang Monash University and Transurban, Michael Fu The University of Melbourne, Kla Tantithamthavorn Monash University and Atlassian, Chetan Arora Monash University, Gunel Gulmammadova Transurban, Joey Chua Transurban | ||
16:50 10mTalk | CONFUSETAINT: Exploiting Vulnerabilities to Bypass Dynamic Taint Analysis NIER Track | ||