VulGen: Realistic Vulnerability Generation Via Pattern Mining and Deep Learning (ICSE 2023 - Technical Track)

Who

Yu Nong, Yuzhe Ou, Michael Pradel, Feng Chen, Haipeng Cai

Track

ICSE 2023 Technical Track

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 May 2023 17:00 - 17:15 at Meeting Room 103 - SE for security 1 Chair(s): Abhik Roychoudhury

Abstract

Building new, powerful data-driven defenses against prevalent software vulnerabilities needs sizable, quality vulnerability datasets, so does large-scale benchmarking of existing defense solutions. Automatic data generation would promisingly meet the need, yet there is little work aimed to generate much-needed quality vulnerable samples. Meanwhile, existing similar and adaptable techniques suffer critical limitations for that purpose. In this paper, we present VulGen, the first injection-based vulnerability-generation technique that is not limited to a particular class of vulnerabilities. VulGen combines the strengths of deterministic (pattern-based) and probabilistic (deep-learning/ DL-based) program transformation approaches while mutually overcoming respective weaknesses. This is achieved through close collaborations between pattern mining/application and DL-based injection localization, which separates the concerns with how and where to inject. By leveraging large, pretrained programming language modeling and only learning locations, VulGen mitigates its own needs for quality vulnerability data (for training the localization model). Extensive evaluations show that VulGen significantly outperforms a state-of-the-art (SOTA) pattern-based peer technique as well as both transformer- and GNN-based approaches in terms of the percentages of generated samples that are vulnerable and those also exactly matching the ground truth (by 38.0–430.1% and 16.3–158.2%, respectively). The VulGen-generated samples led to substantial performance improvements for two SOTA DL-based vulnerability detectors (by up to 31.8% higher in F1), close to those brought by the ground-truth real-world samples and much higher than those by the same numbers of existing synthetic samples.

Link to Preprint

http://chapering.github.io/pubs/icse23yu.pdf

Yu Nong

Washington State University

Yuzhe Ou

University of Texas at Dallas

United States

Michael Pradel

University of Stuttgart

Germany

Feng Chen

University of Texas at Dallas

United States

Haipeng Cai

Washington State University

United States

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 May
Displayed time zone: Hobart change

15:45 - 17:15	SE for security 1Technical Track / SEET - Software Engineering Education and Training / Journal-First Papers / SEIS - Software Engineering in Society at Meeting Room 103 Chair(s): Abhik Roychoudhury National University of Singapore

15:45 15m Talk		TAINTMINI: Detecting Flow of Sensitive Data in Mini-Programs with Static Taint Analysis Technical Track Chao Wang , Ronny Ko The Ohio State University, Yue Zhang The Ohio State University, Yuqing Yang The Ohio State University, Zhiqiang Lin The Ohio State University
16:00 15m Talk		AChecker: Statically Detecting Smart Contract Access Control Vulnerabilities Technical Track Asem Ghaleb University of British Columbia, Julia Rubin University of British Columbia, Canada, Karthik Pattabiraman University of British Columbia
16:15 15m Talk		Fine-grained Commit-level Vulnerability Type Prediction By CWE Tree Structure Technical Track Shengyi Pan Zhejiang University, Lingfeng Bao Zhejiang University, Xin Xia Huawei, David Lo Singapore Management University, Shanping Li Zhejiang University Pre-print
16:30 15m Paper		Security Thinking in Online Freelance Software Development SEIS - Software Engineering in Society Irum Rauf The Open University, UK, Marian Petre School of Computing and Communications, The Open University, UK, Thein Tun School of Computing and Communications,The Open University, UK; Simply Business, UK, Tamara Lopez The Open University, Bashar Nuseibeh The Open University, UK; Lero, University of Limerick, Ireland
16:45 7m Talk		Open Science in Software Engineering: A Study on Deep Learning-Based Vulnerability Detection Journal-First Papers Yu Nong Washington State University, Rainy Sharma Washington State University, Wahab Hamou-Lhadj Concordia University, Montreal, Canada, Xiapu Luo The Hong Kong Polytechnic University, Haipeng Cai Washington State University Link to publication DOI Authorizer link Pre-print
16:52 8m Talk		Training for Security: Planning the Use of a SAT in the Development Pipeline of Web Apps SEET - Software Engineering Education and Training Sabato Nocera University of Salerno, Simone Romano University of Salerno, Rita Francese University of Salerno, Giuseppe Scanniello University of Salerno
17:00 15m Talk		VulGen: Realistic Vulnerability Generation Via Pattern Mining and Deep Learning Technical Track Yu Nong Washington State University, Yuzhe Ou University of Texas at Dallas, Michael Pradel University of Stuttgart, Feng Chen University of Texas at Dallas, Haipeng Cai Washington State University Pre-print