SecureMind: A Framework for Benchmarking Large Language Models in Memory Bug Detection and Repair (ISMM 2025 - International Symposium on Memory Management)

Who

Huanting Wang, Dejice Jacob, David Kelly, Yehia Elkhatib, Jeremy Singer, Zheng Wang

Track

ISMM 2025

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 17 Jun 2025 11:40 - 12:00 at Lilac - Session 2: 1040-1200 [Workloads] Chair(s): Erez Petrank

Abstract

Large language models (LLMs) hold great promise for automating software vulnerability detection and repair, but ensuring their correctness remains a challenge. While recent work has developed benchmarks for evaluating LLMs in bug detection and repair, existing studies rely on hand-crafted datasets that quickly become outdated. Moreover, systematic evaluation of advanced reasoning-based LLMs using chain-of-thought prompting for software security is lacking.
We introduce SecureMind, an open-source framework for evaluating LLMs in vulnerability detection and repair, focusing on memory-related vulnerabilities. SecureMind provides a user-friendly Python interface for defining test plans, which automates data retrieval, preparation, and benchmarking across a wide range of metrics.
Using SecureMind, we assess 10 representative LLMs, including 7 state-of-the-art reasoning models, on 16K test samples spanning 8 Common Weakness Enumeration (CWE) types related to memory safety violations. Our findings highlight the strengths and limitations of current LLMs in handling memory-related vulnerabilities.

Link to Preprint

https://www.dcs.gla.ac.uk/~jsinger/pdfs/ismm25.pdf

DOI

https://doi.org/10.1145/3735950.3735954

Huanting Wang

University of Leeds

United Kingdom

Dejice Jacob

University of Glasgow

United Kingdom

David Kelly

University of Glasgow

United Kingdom

Yehia Elkhatib

University of Glasgow

United Kingdom

Jeremy Singer

University of Glasgow

United Kingdom

Zheng Wang

University of Leeds

United Kingdom

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 17 Jun
Displayed time zone: Seoul change

10:40 - 12:00	Session 2: 1040-1200 [Workloads]ISMM 2025 at Lilac Chair(s): Erez Petrank Technion

10:40 20m Talk		Reconsidering Garbage Collection in Julia: A Practitioner Report ISMM 2025 Luis Eduardo de Souza Amorim Australian National University, Yi Lin Australian National University, Stephen M. Blackburn Google; Australian National University, Diogo Netto RelationalAI, Gabriel Baraldi JuliaHub, Nathan Daly RelationalAI, Tony Hosking Australian National University, Kiran Pamnany RelationalAI, Oscar Smith JuliaHub DOI
11:00 20m Talk		Reworking Memory Management in CRuby: A Practitioner Report ISMM 2025 Kunshan Wang Australian National University, Stephen M. Blackburn Google; Australian National University, Peter Zhu Shopify, Matthew Valentine-House Shopify DOI
11:20 20m Talk		Lifetime Dispersion and Generational GC: An Intellectual AbstractRemote ISMM 2025 Stephen Dolan Jane Street DOI
11:40 20m Talk		SecureMind: A Framework for Benchmarking Large Language Models in Memory Bug Detection and Repair ISMM 2025 Huanting Wang University of Leeds, Dejice Jacob University of Glasgow, David Kelly University of Glasgow, Yehia Elkhatib University of Glasgow, Jeremy Singer University of Glasgow, Zheng Wang University of Leeds DOI Pre-print