Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis (RAIE 2025 - 3rd International Workshop on Responsible AI Engineering)

Who

Jonathan Brokman, Omer Hofman, Oren Rachmil, Inderjeet Singh, Vikas Pahuja, Aishvariya Priya Rathina Sabapathy, Amit Giloni, Roman Vainshtein, Hisashi Kojima

Track

RAIE 2025 Responsible AI Engineering

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 29 Apr 2025 10:00 - 10:15 at 207 - Session 1 Chair(s): Qinghua Lu

Abstract

We present a comparative analysis of open-source tools that scan conversational large language models (LLMs) for vulnerabilities, in short - \emph{scanners}. As LLMs become integral to various applications, they also present potential attack surfaces, exposed to security risks such as information leakage and jailbreak attacks. AI red-teaming, adapted from traditional cybersecurity, is recognized by governments and companies as essential - often emphasizing the challenge of continuously evolving threats. Our study evaluates prominent, cutting-edge scanners - Garak, Giskard, PyRIT, and CyberSecEval - that address this challenge by automating red-teaming processes. We detail the distinctive features and practical use of these scanners, outline unifying principles of their design and perform quantitative evaluations to compare them. These evaluations uncover significant reliability issues in detecting successful attacks, highlighting a fundamental gap for future development. Additionally, we contribute a foundational labeled dataset, which serves as an initial step to bridge this gap. Based on the above, we provide suggestions for future regulations and standardization, as well as strategic recommendations to assist organizations in scanner selection, considering customizability, test-suite comprehensiveness and industry-specific use cases.

Link to Preprint

https://arxiv.org/abs/2410.16527

Jonathan Brokman

Fujitsu Research

Omer Hofman

Fujitsu Research

Oren Rachmil

Fujitsu Research

Inderjeet Singh

Fujitsu Research

Vikas PahujaPresenter

Fujitsu Research

Aishvariya Priya Rathina Sabapathy

Fujitsu Research

Amit Giloni

Fujitsu Research

Roman Vainshtein

Fujitsu Research

Hisashi Kojima

Fujitsu Research

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 29 Apr
Displayed time zone: Eastern Time (US & Canada) change

09:00 - 10:30	Session 1RAIE at 207 Chair(s): Qinghua Lu Data61, CSIRO

09:00 10m Day opening		Opening Remarks RAIE Qinghua Lu Data61, CSIRO
09:10 50m Keynote		Keynote 1 by Rick Kazman RAIE K: Rick Kazman University of Hawai‘i at Mānoa
10:00 15m Talk		Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis RAIE Jonathan Brokman Fujitsu Research, Omer Hofman Fujitsu Research, Oren Rachmil Fujitsu Research, Inderjeet Singh Fujitsu Research, P: Vikas Pahuja Fujitsu Research, Aishvariya Priya Rathina Sabapathy Fujitsu Research, Amit Giloni Fujitsu Research, Roman Vainshtein Fujitsu Research, Hisashi Kojima Fujitsu Research Pre-print
10:15 12m Talk		Mitigating Values Debt in Generative AI: Responsible Engineering with Graph RAG RAIE P: Waqar Hussain Data61, CSIRO