ASE 2025
Sun 16 - Thu 20 November 2025 Seoul, South Korea

This program is tentative and subject to change.

Tue 18 Nov 2025 16:10 - 16:20 at Grand Hall 2 - Security 6
Wed 19 Nov 2025 16:10 - 16:20 at Vista - Security 7

Existing research has demonstrated promising results when applying large language models (LLMs) to detect security vulnerabilities in source code. However, these studies have been exclusively evaluated on benchmarks from Open Source systems, using publicly known vulnerabilities that are likely part of the LLMs’ training data. This raises concerns that reported performance metrics may be inflated due to data contamination, providing a misleading view of the models’ actual capabilities.

In this paper, we quantify this effect with a case study that evaluates five frontier LLMs on two carefully curated datasets: CWE-Bench-Java (Open Source dataset) and TS-Vuls (a closed source commercial codebase). To provide a second angle, we also split CWE-Bench-Java by the CVE record’s date to explore temporal contamination based on LLM’s knowledge cutoff dates.

Our results reveal that the average F1 score dropped by approximately 20 percentage points when comparing the Open Source to the closed-source dataset. Additionally, the precision drops from 56% to 34% on average, which is statistically significant (p $<$ 0.05) for four of five models. This declining trend is consistent across all tested LLMs and metrics. In contrast, the results for temporal split on Open Source data are inconclusive, suggesting that using a knowledge cutoff may reduce but does not ensure the elimination of contamination effects.

Although our study is based on a single closed source system and thus not generalizable, these findings provide the first empirical evidence that evaluating LLM based vulnerability detection on Open Source benchmarks may lead to overly optimistic results. This motivates the extension of the closed source dataset in future LLM evaluations.

This program is tentative and subject to change.

Tue 18 Nov

Displayed time zone: Seoul change

16:00 - 17:00
16:00
10m
Talk
Measuring Software Resilience Using Socially Aware Truck Factor Estimation
NIER Track
Alexis Butler Royal Holloway University of London, Dan O'Keeffe Royal Holloway, University of London, Santanu Dash University of Surrey
16:10
10m
Talk
Should We Evaluate LLM Based Security Analysis Approaches on Open Source Systems?
Industry Showcase
Kohei Dozono Technical University of Munich, Jonas Engesser Technical University of Munich, Benjamin Hummel CQSE GmbH, Alexander Pretschner TU Munich, Tobias Roehm CQSE GmbH
16:20
10m
Talk
DALEQ - Explainable Equivalence for Java Bytecode
Industry Showcase
Jens Dietrich Victoria University of Wellington, Behnaz Hassanshahi Oracle
16:30
10m
Talk
A Secure Mocking Approach towards Software Supply Chain Security
NIER Track
Daisuke Yamaguchi NTT, Inc., Shinobu Saito NTT, Inc., Takuya Iwatsuka NTT, Nariyoshi Chida NTT, Inc, Tachio Terauchi Waseda University
16:40
10m
Talk
TRON: Fuzzing Linux Network Stack via Protocol-System Call Payload Synthesis
Industry Showcase
Qiang Zhang Hunan University, Yifei Chu Tsinghua University, Yuheng Shen Tsinghua University, Jianzhong Liu Tsinghua University, Heyuan Shi Central South University, Yu Jiang Tsinghua University, Wanli Chang College of Computer Science and Electronic Engineering, Hunan University
16:50
10m
Talk
Industry Practice of LLM-Assisted Protocol Fuzzing for Commercial Communication Modules
Industry Showcase
Qiang Fu Central South University, Changjian Liu Central South University, Yuan Ding China Mobile IoT, Chao Fan China Mobile IoT, Yulai Fu , Yuhan Chen Central South Sniversity, Ying Fu Tsinghua University, Ronghua Shi Central South University, Fuchen Ma Tsinghua University, Heyuan Shi Central South University

Wed 19 Nov

Displayed time zone: Seoul change

16:00 - 17:00
16:00
10m
Talk
Measuring Software Resilience Using Socially Aware Truck Factor Estimation
NIER Track
Alexis Butler Royal Holloway University of London, Dan O'Keeffe Royal Holloway, University of London, Santanu Dash University of Surrey
16:10
10m
Talk
Should We Evaluate LLM Based Security Analysis Approaches on Open Source Systems?
Industry Showcase
Kohei Dozono Technical University of Munich, Jonas Engesser Technical University of Munich, Benjamin Hummel CQSE GmbH, Alexander Pretschner TU Munich, Tobias Roehm CQSE GmbH
16:20
10m
Talk
DALEQ - Explainable Equivalence for Java Bytecode
Industry Showcase
Jens Dietrich Victoria University of Wellington, Behnaz Hassanshahi Oracle
16:30
10m
Talk
A Secure Mocking Approach towards Software Supply Chain Security
NIER Track
Daisuke Yamaguchi NTT, Inc., Shinobu Saito NTT, Inc., Takuya Iwatsuka NTT, Nariyoshi Chida NTT, Inc, Tachio Terauchi Waseda University
16:40
10m
Talk
TRON: Fuzzing Linux Network Stack via Protocol-System Call Payload Synthesis
Industry Showcase
Qiang Zhang Hunan University, Yifei Chu Tsinghua University, Yuheng Shen Tsinghua University, Jianzhong Liu Tsinghua University, Heyuan Shi Central South University, Yu Jiang Tsinghua University, Wanli Chang College of Computer Science and Electronic Engineering, Hunan University
16:50
10m
Talk
Industry Practice of LLM-Assisted Protocol Fuzzing for Commercial Communication Modules
Industry Showcase
Qiang Fu Central South University, Changjian Liu Central South University, Yuan Ding China Mobile IoT, Chao Fan China Mobile IoT, Yulai Fu , Yuhan Chen Central South Sniversity, Ying Fu Tsinghua University, Ronghua Shi Central South University, Fuchen Ma Tsinghua University, Heyuan Shi Central South University