S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models (ISSTA 2025 - Research Papers)

Who

Xiaohan Yuan, Jinfeng Li, Dongxia Wang, Yuefeng Chen, Xiaofeng Mao, Longtao Huang, Jialuo Chen, Hui Xue, Xiaoxia Liu, Wenhai Wang, Kui Ren, Jingyi Wang

Track

ISSTA 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 27 Jun 2025 14:25 - 14:50 at Cosmos 3A - AI Testing Chair(s): Cuiyun Gao

Abstract

Generative large language models (LLMs) have revolutionized natural language processing with their transformative and emergent capabilities. However, recent evidence indicates that LLMs can produce harmful content that violates social norms, raising significant concerns regarding the safety and ethical ramifications of deploying these advanced models. Thus, it is both critical and imperative to perform a rigorous and comprehensive safety evaluation of LLMs before deployment. Despite this need, owing to the extensiveness of LLM generation space, it still lacks a unified and standardized risk taxonomy to systematically reflect the LLM content safety, as well as automated safety assessment techniques to explore the potential risk efficiently.

To bridge the striking gap, we propose \textbf{S-Eval}, a novel LLM-based automated \textbf{S}afety \textbf{Eval}uation framework with a newly defined comprehensive risk taxonomy. S-Eval incorporates two key components, i.e., an expert testing LLM $\mathcal{M}_t$ and a novel safety critique LLM $\mathcal{M}_c$. The expert testing LLM $\mathcal{M}_t$ is responsible for automatically generating test cases in accordance with the proposed risk taxonomy (including 8 risk dimensions and a total of 102 subdivided risks). The safety critique LLM $\mathcal{M}_c$ can provide quantitative and explainable safety evaluations for better risk awareness of LLMs. In contrast to prior works, S-Eval differs in significant ways: (i) \textit{efficient} – we construct a multi-dimensional and open-ended benchmark comprising 220,000 test cases across 102 risks utilizing $\mathcal{M}_t$ and conduct safety evaluations for 21 influential LLMs via $\mathcal{M}_c$ on our benchmark. The entire process is fully automated and requires no human involvement. (ii) \textit{effective} – extensive validations show S-Eval facilitates a more thorough assessment and better perception of potential LLM risks, and $\mathcal{M}_c$ not only accurately quantifies the risks of LLMs but also provides explainable and in-depth insight into their safety, surpassing comparable models such as LLaMA-Guard-2. (iii) \textit{adaptive} – S-Eval can be flexibly configured and adapted to the rapid evolution of LLMs and accompanying new safety threats, test generation methods and safety critique methods thanks to the LLM-based architecture. We further study the impact of hyper-parameters and language environments on model safety, which may lead to promising directions for future research. S-Eval has been deployed in our industrial partner for the automated safety evaluation of multiple LLMs serving millions of users, demonstrating its effectiveness in real-world scenarios.

DOI

https://doi.org/10.1145/3728971

Xiaohan Yuan

Zhejiang University

China

Jinfeng Li

Alibaba Group

China

Dongxia Wang

Zhejiang University

China

Yuefeng Chen

Alibaba Group

Xiaofeng Mao

Alibaba Group

Longtao Huang

Alibaba Group

Jialuo Chen

Zhejiang University

Hui Xue

Alibaba Group

China

Xiaoxia Liu

Zhejiang University

Wenhai Wang

Zhejiang University

Kui Ren

Zhejiang University

Jingyi Wang

Zhejiang University

China

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 27 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:30	AI TestingResearch Papers / Tool Demonstrations at Cosmos 3A Chair(s): Cuiyun Gao Harbin Institute of Technology

14:00 25m Talk		AudioTest: Prioritizing Audio Test Cases Research Papers Yinghua Li University of Luxembourg, Xueqi Dang University of Luxembourg, SnT, Wendkuuni Arzouma Marc Christian OUEDRAOGO University of Luxembourg, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg DOI Media Attached
14:25 25m Talk		S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models Research Papers Xiaohan Yuan Zhejiang University, Jinfeng Li Alibaba Group, Dongxia Wang Zhejiang University, Yuefeng Chen Alibaba Group, Xiaofeng Mao Alibaba Group, Longtao Huang Alibaba Group, Jialuo Chen Zhejiang University, Hui Xue Alibaba Group, Xiaoxia Liu Zhejiang University, Wenhai Wang Zhejiang University, Kui Ren Zhejiang University, Jingyi Wang Zhejiang University DOI
14:50 25m Talk		Improving Deep Learning Framework Testing with Model-Level Metamorphic Testing Research Papers Yanzhou Mu , Juan Zhai University of Massachusetts at Amherst, Chunrong Fang Nanjing University, Xiang Chen Nantong University, Zhixiang Cao Xi'an Jiaotong University, Peiran Yang Nanjing University, Kexin Zhao Nanjing University, An Guo Nanjing University, Zhenyu Chen Nanjing University DOI
15:15 15m Demonstration		ASTRAL: A Tool for the Automated Safety Testing of Large Language Models Tool Demonstrations Miriam Ugarte Mondragon University, Pablo Valle Mondragon University, José Antonio Parejo Maestre Universidad de Sevilla, Sergio Segura SCORE Lab, I3US Institute, Universidad de Sevilla, Seville, Spain, Aitor Arrieta Mondragon University

Information for Participants

Fri 27 Jun 2025 14:00 - 15:30 at Cosmos 3A - AI Testing Chair(s): Cuiyun Gao

Info for room Cosmos 3A:

Cosmos 3A is the first room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.