ICSE 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada
Wed 30 Apr 2025 16:00 - 16:15 at 210 - SE for AI with Security Chair(s): Lina Marsso

Large language models (LLMs) have revolutionized artificial intelligence, but their increasing deployment across critical domains has raised concerns about their abnormal behaviors when faced with malicious attacks. Such vulnerability alerts the widespread inadequacy of pre-release testing. In this paper, we conduct a comprehensive empirical study to evaluate the effectiveness of traditional coverage criteria in identifying such inadequacies, exemplified by the significant security concern of jailbreak attacks. Our study begins with a clustering analysis of the hidden states of LLMs, revealing that the embedded characteristics effectively distinguish between different query types. We then systematically evaluate the performance of these criteria across three key dimensions: criterion level, layer level, and token level.

Our research uncovers significant differences in the sets of neurons covered when LLMs process normal versus jailbreak queries, aligning with our clustering experiments. Leveraging these findings, we propose three practical applications of coverage criteria in the context of LLM security testing. Specifically, we develop a real-time jailbreak detection mechanism that achieves high accuracy (93.61% on average) in classifying queries as normal or jailbreak. Furthermore, we explore the use of coverage levels to prioritize test cases, improving testing efficiency by focusing on high-risk interactions and removing redundant tests. Lastly, we introduce a coverage-guided approach for generating jailbreak attack examples, enabling systematic refinement of prompts to uncover new vulnerabilities. This study improves our understanding of LLM security testing, enhances their safety, and provides a foundation for developing more robust AI applications.

Wed 30 Apr

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30
SE for AI with SecurityResearch Track at 210
Chair(s): Lina Marsso École Polytechnique de Montréal
16:00
15m
Talk
Understanding the Effectiveness of Coverage Criteria for Large Language Models: A Special Angle from Jailbreak AttacksSecuritySE for AIArtifact-Available
Research Track
shide zhou Huazhong University of Science and Technology, Li Tianlin NTU, Kailong Wang Huazhong University of Science and Technology, Yihao Huang NTU, Ling Shi Nanyang Technological University, Yang Liu Nanyang Technological University, Haoyu Wang Huazhong University of Science and Technology
16:15
15m
Talk
Diversity Drives Fairness: Ensemble of Higher Order Mutants for Intersectional Fairness of Machine Learning SoftwareSecuritySE for AI
Research Track
Zhenpeng Chen Nanyang Technological University, Xinyue Li Peking University, Jie M. Zhang King's College London, Federica Sarro University College London, Yang Liu Nanyang Technological University
Pre-print
16:30
15m
Talk
HIFI: Explaining and Mitigating Algorithmic Bias through the Lens of Game-Theoretic InteractionsSecuritySE for AIArtifact-Available
Research Track
Lingfeng Zhang East China Normal University, Zhaohui Wang Software Engineering Institute, East China Normal University, Yueling Zhang East China Normal University, Min Zhang East China Normal University, Jiangtao Wang Software Engineering Institute, East China Normal University
16:45
15m
Talk
Towards More Trustworthy Deep Code Models by Enabling Out-of-Distribution DetectionSecuritySE for AI
Research Track
Yanfu Yan William & Mary, Viet Duong William & Mary, Huajie Shao College of William & Mary, Denys Poshyvanyk William & Mary
17:00
15m
Talk
FairSense: Long-Term Fairness Analysis of ML-Enabled SystemsSecuritySE for AIArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Yining She Carnegie Mellon University, Sumon Biswas Carnegie Mellon University, Christian Kästner Carnegie Mellon University, Eunsuk Kang Carnegie Mellon University