ISMM 2025
Tue 17 Jun 2025 Seoul, South Korea
co-located with PLDI 2025
Tue 17 Jun 2025 11:40 - 12:00 at Lilac - Session 2: 1040-1200 [Workloads] Chair(s): Erez Petrank

Large language models (LLMs) hold great promise for automating software vulnerability detection and repair, but ensuring their correctness remains a challenge. While recent work has developed benchmarks for evaluating LLMs in bug detection and repair, existing studies rely on hand-crafted datasets that quickly become outdated. Moreover, systematic evaluation of advanced reasoning-based LLMs using chain-of-thought prompting for software security is lacking.
We introduce SecureMind, an open-source framework for evaluating LLMs in vulnerability detection and repair, focusing on memory-related vulnerabilities. SecureMind provides a user-friendly Python interface for defining test plans, which automates data retrieval, preparation, and benchmarking across a wide range of metrics.
Using SecureMind, we assess 10 representative LLMs, including 7 state-of-the-art reasoning models, on 16K test samples spanning 8 Common Weakness Enumeration (CWE) types related to memory safety violations. Our findings highlight the strengths and limitations of current LLMs in handling memory-related vulnerabilities.

Tue 17 Jun

Displayed time zone: Seoul change

10:40 - 12:00
Session 2: 1040-1200 [Workloads]ISMM 2025 at Lilac
Chair(s): Erez Petrank Technion
10:40
20m
Talk
Reconsidering Garbage Collection in Julia: A Practitioner Report
ISMM 2025
Luis Eduardo de Souza Amorim Australian National University, Yi Lin Australian National University, Stephen M. Blackburn Google; Australian National University, Diogo Netto RelationalAI, Gabriel Baraldi JuliaHub, Nathan Daly RelationalAI, Tony Hosking Australian National University, Kiran Pamnany RelationalAI, Oscar Smith JuliaHub
DOI
11:00
20m
Talk
Reworking Memory Management in CRuby: A Practitioner Report
ISMM 2025
Kunshan Wang Australian National University, Stephen M. Blackburn Google; Australian National University, Peter Zhu Shopify, Matthew Valentine-House Shopify
DOI
11:20
20m
Talk
Lifetime Dispersion and Generational GC: An Intellectual AbstractRemote
ISMM 2025
Stephen Dolan Jane Street
DOI
11:40
20m
Talk
SecureMind: A Framework for Benchmarking Large Language Models in Memory Bug Detection and Repair
ISMM 2025
Huanting Wang University of Leeds, Dejice Jacob University of Glasgow, David Kelly University of Glasgow, Yehia Elkhatib University of Glasgow, Jeremy Singer University of Glasgow, Zheng Wang University of Leeds
DOI Pre-print
:
:
:
: