Revisiting the Performance of Deep Learning-Based Vulnerability Detection on Realistic Datasets (ICSE 2025 - Journal-first Papers)

Who

Partha Chakraborty, Krishna Kanth Arumugam, Mahmoud Alfadel, Mei Nagappan, Shane McIntosh

Track

ICSE 2025 Journal-first Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 2 May 2025 16:45 - 17:00 at 210 - Security and QA Chair(s): Nafiseh Kahani

Abstract

The impact of software vulnerabilities on everyday software systems is concerning. Although deep learning-based models have been proposed for vulnerability detection, their reliability remains a significant concern. While prior evaluation of such models reports impressive recall/F1 scores of up to 99%, we find that these models underperform in practical scenarios, particularly when evaluated on the entire codebases rather than only the fixing commit. In this paper, we introduce a comprehensive dataset ( Real-Vul ) designed to accurately represent real-world scenarios for evaluating vulnerability detection models. We evaluate DeepWukong, LineVul, ReVeal, and IVDetect vulnerability detection approaches and observe a surprisingly significant drop in performance, with precision declining by up to 95 percentage points and F1 scores dropping by up to 91 percentage points. A closer inspection reveals a substantial overlap in the embeddings generated by the models for vulnerable and uncertain samples (non-vulnerable or vulnerability not reported yet), which likely explains why we observe such a large increase in the quantity and rate of false positives. Additionally, we observe fluctuations in model performance based on vulnerability characteristics (e.g., vulnerability types and severity). For example, the studied models achieve 26 percentage points better F1 scores when vulnerabilities are related to information leaks or code injection rather than when vulnerabilities are related to path resolution or predictable return values. Our results highlight the substantial performance gap that still needs to be bridged before deep learning-based vulnerability detection is ready for deployment in practical settings. We dive deeper into why models underperform in realistic settings and our investigation revealed overfitting as a key issue. We address this by introducing an augmentation technique, potentially improving performance by up to 30%. We contribute (a) an approach to creating a dataset that future research can use to improve the practicality of model evaluation; (b) Real-Vul � a comprehensive dataset that adheres to this approach; and (c) empirical evidence that the deep learning-based models struggle to perform in a real-world setting.

Partha Chakraborty

University of Waterloo

Canada

Krishna Kanth Arumugam

University of Waterloo

Mahmoud Alfadel

University of Calgary

Canada

Mei Nagappan

University of Waterloo

Canada

Shane McIntosh

University of Waterloo

Canada

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 2 May
Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30	Security and QAResearch Track / Journal-first Papers / SE In Practice (SEIP) at 210 Chair(s): Nafiseh Kahani Carleton University

16:00 15m Talk		ROSA: Finding Backdoors with FuzzingSecurityAward Winner Best Artifact Research Track Dimitri Kokkonis Université Paris-Saclay, CEA, List, Michaël Marcozzi Université Paris-Saclay, CEA, List, Emilien Decoux Université Paris-Saclay, CEA List, Stefano Zacchiroli LTCI, Télécom Paris, Institut Polytechnique de Paris, Palaiseau, France Link to publication DOI Pre-print Media Attached File Attached
16:15 15m Talk		Analyzing the Feasibility of Adopting Google's Nonce-Based CSP Solutions on WebsitesSecurity Research Track Mengxia Ren Colorado School of Mines, Anhao Xiang Colorado School of Mines, Chuan Yue Colorado School of Mines
16:30 15m Talk		Early Detection of Performance Regressions by Bridging Local Performance Data and Architectural ModelsSecurityAward Winner Research Track Lizhi Liao Memorial University of Newfoundland, Simon Eismann University of Würzburg, Heng Li Polytechnique Montréal, Cor-Paul Bezemer University of Alberta, Diego Elias Costa Concordia University, Canada, André van Hoorn University of Hamburg, Germany, Weiyi Shang University of Waterloo
16:45 15m Talk		Revisiting the Performance of Deep Learning-Based Vulnerability Detection on Realistic DatasetsSecurity Journal-first Papers Partha Chakraborty University of Waterloo, Krishna Kanth Arumugam University of Waterloo, Mahmoud Alfadel University of Calgary, Mei Nagappan University of Waterloo, Shane McIntosh University of Waterloo
17:00 15m Talk		Sunflower: Enhancing Linux Kernel Fuzzing via Exploit-Driven Seed GenerationSecurity SE In Practice (SEIP) Qiang Zhang Hunan University, Yuheng Shen Tsinghua University, Jianzhong Liu Tsinghua University, Yiru Xu Tsinghua University, Heyuan Shi Central South University, Yu Jiang Tsinghua University, Wanli Chang College of Computer Science and Electronic Engineering, Hunan University
17:15 15m Talk		Practical Object-Level Sanitizer With Aggregated Memory Access and Custom AllocatorSecurity Research Track Xiaolei wang National University of Defense Technology, Ruilin Li National University of Defense Technology, Bin Zhang National University of Defense Technology, Chao Feng National University of Defense Technology, Chaojing Tang National University of Defense Technology