AST 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025

This program is tentative and subject to change.

Tue 29 Apr 2025 15:00 - 15:30 at 211 - Session 5: Testing of LLMs

Large language models (LLMs) can perform a variety of tasks given an input prompt that contains a description of the task. In an attempt to enhance the performance and capabilities of LLMs, recent research has focused on augmenting LLMs with external tools, such as Python functions, REST APIs, and other deep learning models. While much of the research on tool-augmented LLMs (TaLLMs) has been focused on improving their capabilities, research on understanding and characterizing the kinds of failures that can occur in these systems is lacking. To address this gap, this paper proposes a taxonomy of failures in TaLLMs and their root causes, details an analysis of the failures that occur in two published TaLLMs (Gorilla and Chameleon), and provides recommendations on fault localization and repair of TaLLMs.

This program is tentative and subject to change.

Tue 29 Apr

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
Session 5: Testing of LLMsAST 2025 at 211
14:00
30m
Full-paper
Adaptive Probabilistic Operational Testing for Large Language Models Evaluation
AST 2025
Ali Asgari TU Delft, Antonio Guerriero Università di Napoli Federico II, Roberto Pietrantuono Università di Napoli Federico II, Stefano Russo Università di Napoli Federico II
14:30
30m
Full-paper
ASTRAL: Automated Safety Testing of Large Language Models
AST 2025
Miriam Ugarte Mondragon University, Pablo Valle Mondragon University, José Antonio Parejo Maestre University of Seville, Sergio Segura University of Seville, Aitor Arrieta Mondragon University
Pre-print
15:00
30m
Full-paper
A Taxonomy of Failures in Tool-Augmented LLMs
AST 2025
Cailin Winston University of Washington, René Just University of Washington
:
:
:
: