GVI: Guided Vulnerability Imagination for Boosting Deep Vulnerability Detectors (ICSE 2025 - Research Track)

Who

Heng Yong, Zhong Li, Minxue Pan, Tian Zhang, Jianhua Zhao, Xuandong Li

Track

ICSE 2025 Research Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 2 May 2025 16:00 - 16:15 at 213 - AI for Security 3 Chair(s): Tien N. Nguyen

Abstract

The use of deep learning to achieve automated software vulnerability detection has been a longstanding interest within the software security community. These deep vulnerability detectors are mostly trained in a supervised manner, which heavily relies on large-scale, high-quality vulnerability datasets. However, the vulnerability datasets used to train deep vulnerability detectors frequently exhibit class imbalance due to the inherent nature of vulnerability data, where vulnerable cases are significantly rarer than non-vulnerable cases. This imbalance adversely affects the effectiveness of these detectors. A promising solution to address the class imbalance problem is to artificially generate vulnerable samples to enhance vulnerability datasets, yet existing vulnerability generation techniques are not satisfactory due to their inadequate representation of real-world vulnerabilities or their reliance on large-scale vulnerable samples for training the generation model.

This paper proposes GVI, a novel approach aimed at generating vulnerable samples to boost deep vulnerability detectors. GVI takes inspiration from human learning with imagination and proposes exploring LLMs to imagine and create new, informative vulnerable samples from given seed vulnerabilities. Specifically, we design a Chain-of-Thought inspired prompt in GVI that instructs the LLMs to first analyze the seed to retrieve attributes related to vulnerabilities and then generate a set of vulnerabilities based on the seed’s attributes. Our extensive experiments on three vulnerability datasets (i.e., Devign, ReVeal, and BigVul) and across three deep vulnerability detectors (i.e., Devign, ReVeal, and LineVul) demonstrate that the vulnerable samples generated by GVI are not only more accurate but also more effective in enhancing the performance of deep vulnerability detectors.

Heng Yong

Nanjing University

Zhong Li

China

Minxue Pan

Nanjing University

China

Tian Zhang

Nanjing University

China

Jianhua Zhao

Nanjing University, China

Xuandong Li

Nanjing University

China

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 2 May
Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30	AI for Security 3Research Track / New Ideas and Emerging Results (NIER) at 213 Chair(s): Tien N. Nguyen University of Texas at Dallas

16:00 15m Talk		GVI: Guided Vulnerability Imagination for Boosting Deep Vulnerability DetectorsSecurity Research Track Heng Yong Nanjing University, Zhong Li , Minxue Pan Nanjing University, Tian Zhang Nanjing University, Jianhua Zhao Nanjing University, China, Xuandong Li Nanjing University
16:15 15m Talk		Decoding Secret Memorization in Code LLMs Through Token-Level CharacterizationSecurity Research Track Yuqing Nie Beijing University of Posts and Telecommunications, Chong Wang Nanyang Technological University, Kailong Wang Huazhong University of Science and Technology, Guoai Xu Harbin Institute of Technology, Shenzhen, Guosheng Xu Key Laboratory of Trustworthy Distributed Computing and Service (MoE), Beijing University of Posts and Telecommunications, Haoyu Wang Huazhong University of Science and Technology
16:30 15m Talk		Are We Learning the Right Features? A Framework for Evaluating DL-Based Software Vulnerability Detection SolutionsSecurity Research Track Satyaki Das University of Southern California, Syeda Tasnim Fabiha University of Southern California, Saad Shafiq University of Southern California, Nenad Medvidović University of Southern California Pre-print Media Attached File Attached
16:45 15m Talk		Boosting Static Resource Leak Detection via LLM-based Resource-Oriented Intention InferenceSecurity Research Track Chong Wang Nanyang Technological University, Jianan Liu Fudan University, Xin Peng Fudan University, Yang Liu Nanyang Technological University, Yiling Lou Fudan University
17:00 15m Talk		Weakly-supervised Log-based Anomaly Detection with Inexact Labels via Multi-instance LearningSecurity Research Track Minghua He Peking University, Tong Jia Institute for Artificial Intelligence, Peking University, Beijing, China, Chiming Duan Peking University, Huaqian Cai Peking University, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Gang Huang Peking University
17:15 7m Talk		Towards Early Warning and Migration of High-Risk Dormant Open-Source Software DependenciesSecurity New Ideas and Emerging Results (NIER) Zijie Huang Shanghai Key Laboratory of Computer Software Testing and Evaluation, Lizhi Cai Shanghai Key Laboratory of Computer Software Testing & Evaluating, Shanghai Software Center, Xuan Mao Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China, Kang Yang Shanghai Key Laboratory of Computer Software Testing and Evaluating, Shanghai Development Center of Computer Software Technology