ASTRAL: A Tool for the Automated Safety Testing of Large Language Models (ISSTA 2025 - Tool Demonstrations)

Who

Miriam Ugarte, Pablo Valle, José Antonio Parejo Maestre, Sergio Segura, Aitor Arrieta

Track

ISSTA 2025 Tool Demonstrations

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 27 Jun 2025 15:15 - 15:30 at Cosmos 3A - AI Testing Chair(s): Cuiyun Gao

Abstract

In this paper, we present ASTRAL, a tool that automates the generation and execution of test inputs (i.e., prompts) to evaluate the safety of Large Language Models (LLMs). ASTRAL consists of three microservice modules. The first is a test generator, which employs a novel black-box coverage criterion to create balanced and diverse unsafe test inputs across a wide range of safety categories and linguistic characteristics (e.g., different writing styles and persuasion techniques). Additionally, the test generator incorporates an LLM-based approach that leverages Retrieval-Augmented Generation (RAG), few-shot prompting strategies, and web browsing to produce up-to-date test inputs. The second module is the test executor, which runs the generated test inputs on the LLM under test. Finally, the test evaluator acts an oracle to assess the execution outputs to identify unsafe responses, enabling a fully automated LLM testing process.

Miriam Ugarte

Mondragon University

Pablo Valle

Mondragon University

Spain

José Antonio Parejo Maestre

Universidad de Sevilla

Spain

Sergio Segura

SCORE Lab, I3US Institute, Universidad de Sevilla, Seville, Spain

Spain

Aitor Arrieta

Mondragon University

Spain

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 27 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:30	AI TestingResearch Papers / Tool Demonstrations at Cosmos 3A Chair(s): Cuiyun Gao Harbin Institute of Technology

14:00 25m Talk		AudioTest: Prioritizing Audio Test Cases Research Papers Yinghua Li University of Luxembourg, Xueqi Dang University of Luxembourg, SnT, Wendkuuni Arzouma Marc Christian OUEDRAOGO University of Luxembourg, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg DOI Media Attached
14:25 25m Talk		S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models Research Papers Xiaohan Yuan Zhejiang University, Jinfeng Li Alibaba Group, Dongxia Wang Zhejiang University, Yuefeng Chen Alibaba Group, Xiaofeng Mao Alibaba Group, Longtao Huang Alibaba Group, Jialuo Chen Zhejiang University, Hui Xue Alibaba Group, Xiaoxia Liu Zhejiang University, Wenhai Wang Zhejiang University, Kui Ren Zhejiang University, Jingyi Wang Zhejiang University DOI
14:50 25m Talk		Improving Deep Learning Framework Testing with Model-Level Metamorphic Testing Research Papers Yanzhou Mu , Juan Zhai University of Massachusetts at Amherst, Chunrong Fang Nanjing University, Xiang Chen Nantong University, Zhixiang Cao Xi'an Jiaotong University, Peiran Yang Nanjing University, Kexin Zhao Nanjing University, An Guo Nanjing University, Zhenyu Chen Nanjing University DOI
15:15 15m Demonstration		ASTRAL: A Tool for the Automated Safety Testing of Large Language Models Tool Demonstrations Miriam Ugarte Mondragon University, Pablo Valle Mondragon University, José Antonio Parejo Maestre Universidad de Sevilla, Sergio Segura SCORE Lab, I3US Institute, Universidad de Sevilla, Seville, Spain, Aitor Arrieta Mondragon University

Information for Participants

Fri 27 Jun 2025 14:00 - 15:30 at Cosmos 3A - AI Testing Chair(s): Cuiyun Gao

Info for room Cosmos 3A:

Cosmos 3A is the first room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.