S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models
Generative large language models (LLMs) have revolutionized natural language processing with their transformative and emergent capabilities. However, recent evidence indicates that LLMs can produce harmful content that violates social norms, raising significant concerns regarding the safety and ethical ramifications of deploying these advanced models. Thus, it is both critical and imperative to perform a rigorous and comprehensive safety evaluation of LLMs before deployment. Despite this need, owing to the extensiveness of LLM generation space, it still lacks a unified and standardized risk taxonomy to systematically reflect the LLM content safety, as well as automated safety assessment techniques to explore the potential risk efficiently.
To bridge the striking gap, we propose \textbf{S-Eval}, a novel LLM-based automated \textbf{S}afety \textbf{Eval}uation framework with a newly defined comprehensive risk taxonomy. S-Eval incorporates two key components, i.e., an expert testing LLM $\mathcal{M}_t$ and a novel safety critique LLM $\mathcal{M}_c$. The expert testing LLM $\mathcal{M}_t$ is responsible for automatically generating test cases in accordance with the proposed risk taxonomy (including 8 risk dimensions and a total of 102 subdivided risks). The safety critique LLM $\mathcal{M}_c$ can provide quantitative and explainable safety evaluations for better risk awareness of LLMs. In contrast to prior works, S-Eval differs in significant ways: (i) \textit{efficient} – we construct a multi-dimensional and open-ended benchmark comprising 220,000 test cases across 102 risks utilizing $\mathcal{M}_t$ and conduct safety evaluations for 21 influential LLMs via $\mathcal{M}_c$ on our benchmark. The entire process is fully automated and requires no human involvement. (ii) \textit{effective} – extensive validations show S-Eval facilitates a more thorough assessment and better perception of potential LLM risks, and $\mathcal{M}_c$ not only accurately quantifies the risks of LLMs but also provides explainable and in-depth insight into their safety, surpassing comparable models such as LLaMA-Guard-2. (iii) \textit{adaptive} – S-Eval can be flexibly configured and adapted to the rapid evolution of LLMs and accompanying new safety threats, test generation methods and safety critique methods thanks to the LLM-based architecture. We further study the impact of hyper-parameters and language environments on model safety, which may lead to promising directions for future research. S-Eval has been deployed in our industrial partner for the automated safety evaluation of multiple LLMs serving millions of users, demonstrating its effectiveness in real-world scenarios.
Fri 27 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
14:00 - 15:30 | AI TestingResearch Papers / Tool Demonstrations at Cosmos 3A Chair(s): Cuiyun Gao Harbin Institute of Technology | ||
14:00 25mTalk | AudioTest: Prioritizing Audio Test Cases Research Papers Yinghua Li University of Luxembourg, Xueqi Dang University of Luxembourg, SnT, Wendkuuni Arzouma Marc Christian OUEDRAOGO University of Luxembourg, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg DOI Media Attached | ||
14:25 25mTalk | S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models Research Papers Xiaohan Yuan Zhejiang University, Jinfeng Li Alibaba Group, Dongxia Wang Zhejiang University, Yuefeng Chen Alibaba Group, Xiaofeng Mao Alibaba Group, Longtao Huang Alibaba Group, Jialuo Chen Zhejiang University, Hui Xue Alibaba Group, Xiaoxia Liu Zhejiang University, Wenhai Wang Zhejiang University, Kui Ren Zhejiang University, Jingyi Wang Zhejiang University DOI | ||
14:50 25mTalk | Improving Deep Learning Framework Testing with Model-Level Metamorphic Testing Research Papers Yanzhou Mu , Juan Zhai University of Massachusetts at Amherst, Chunrong Fang Nanjing University, Xiang Chen Nantong University, Zhixiang Cao Xi'an Jiaotong University, Peiran Yang Nanjing University, Kexin Zhao Nanjing University, An Guo Nanjing University, Zhenyu Chen Nanjing University DOI | ||
15:15 15mDemonstration | ASTRAL: A Tool for the Automated Safety Testing of Large Language Models Tool Demonstrations Miriam Ugarte Mondragon University, Pablo Valle Mondragon University, José Antonio Parejo Maestre Universidad de Sevilla, Sergio Segura SCORE Lab, I3US Institute, Universidad de Sevilla, Seville, Spain, Aitor Arrieta Mondragon University |
Cosmos 3A is the first room in the Cosmos 3 wing.
When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.