Learning from What We Know: How to Perform Vulnerability Prediction using Noisy Historical Data (ICSE 2023 - Journal-First Papers)

Who

Aayush Garg, Renzo Degiovanni, Matthieu Jimenez, Maxime Cordy, Mike Papadakis, Yves Le Traon

Track

ICSE 2023 Journal-First Papers

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 19 May 2023 15:00 - 15:07 at Meeting Room 106 - Vulnerability detection Chair(s): Cuiyun Gao

Abstract

Vulnerability prediction refers to the problem of identifying system components that are most likely to be vulnerable. Typically, this problem is tackled by training binary classifiers on historical data. Unfortunately, recent research has shown that such approaches underperform due to the following two reasons: a) the imbalanced nature of the problem, and b) the inherently noisy historical data, i.e., most vulnerabilities are discovered much later than they are introduced. This misleads classifiers as they learn to recognize actual vulnerable components as non-vulnerable. To tackle these issues, we propose TROVON, a technique that learns from known vulnerable components rather than from vulnerable and non-vulnerable components, as typically performed. We perform this by contrasting the known vulnerable, and their respective fixed components. This way, TROVON manages to learn from the things we know, i.e., vulnerabilities, hence reducing the effects of noisy and unbalanced data. We evaluate TROVON by comparing it with existing techniques on three security-critical open source systems, i.e., Linux Kernel, OpenSSL, and Wireshark, with historical vulnerabilities that have been reported in the National Vulnerability Database (NVD). Our evaluation demonstrates that the prediction capability of TROVON significantly outperforms existing vulnerability prediction techniques such as Software Metrics, Imports, Function Calls, Text Mining, Devign, LSTM, and LSTM-RF with an improvement of 40.84% in Matthews Correlation Coefficient (MCC) score under Clean Training Data Settings, and an improvement of 35.52% under Realistic Training Data Settings.

Link to Publication

https://link.springer.com/article/10.1007/s10664-022-10197-4

Link to Preprint

https://github.com/garghub/TROVON

Authorizer Link

https://dl.acm.org/doi/abs/10.1007/s10664-022-10197-4

DOI

https://doi.org/10.1007/s10664-022-10197-4

Aayush Garg

University of Luxembourg, Luxembourg

Luxembourg

Renzo Degiovanni

SnT, University of Luxembourg

Luxembourg

Matthieu Jimenez

SnT, University of Luxembourg

Luxembourg

Maxime Cordy

University of Luxembourg, Luxembourg

Mike Papadakis

University of Luxembourg, Luxembourg

Luxembourg

Yves Le Traon

University of Luxembourg, Luxembourg

Luxembourg

Media

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 19 May
Displayed time zone: Hobart change

13:45 - 15:15	Vulnerability detectionTechnical Track / Journal-First Papers at Meeting Room 106 Chair(s): Cuiyun Gao Harbin Institute of Technology

13:45 15m Talk		An Empirical Study of Deep Learning Models for Vulnerability Detection Technical Track Benjamin Steenhoek Iowa State University, Md Mahbubur Rahman Iowa State University, Richard Jiles Iowa State University, Wei Le Iowa State University Pre-print
14:00 15m Talk		DeepVD: Toward Class-Separation Features for Neural Network Vulnerability Detection Technical Track Wenbo Wang New Jersey Institute of Technology, Tien N. Nguyen University of Texas at Dallas, Shaohua Wang New Jersey Institute of Technology, Yi Li New Jersey Institute of Technology, Jiyuan Zhang University of Illinois Urbana-Champaign, Aashish Yadavally The University of Texas at Dallas Pre-print
14:15 15m Talk		Enhancing Deep Learning-based Vulnerability Detection by Building Behavior Graph Model Technical Track Bin Yuan Huazhong University of Science and Technology, Yifan Lu Huazhong University of Science and Technology, Yilin Fang Huazhong University of Science and Technology, Yueming Wu Nanyang Technological University, Deqing Zou Huazhong University of Science and Technology, Zhen Li Huazhong University of Science and Technology, Zhi Li Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology
14:30 15m Talk		Vulnerability Detection with Graph Simplification and Enhanced Graph Representation Learning Technical Track Xin-Cheng Wen Harbin Institute of Technology, Yupan Harbin Institute of Technology, Cuiyun Gao Harbin Institute of Technology, Hongyu Zhang The University of Newcastle, Jie M. Zhang King's College London, Qing Liao Harbin Institute of Technology
14:45 15m Talk		Does data sampling improve deep learning-based vulnerability detection? Yeas! and Nays! Technical Track Xu Yang University of Manitoba, Shaowei Wang University of Manitoba, Yi Li New Jersey Institute of Technology, Shaohua Wang New Jersey Institute of Technology Pre-print
15:00 7m Talk		Learning from What We Know: How to Perform Vulnerability Prediction using Noisy Historical Data Journal-First Papers Aayush Garg University of Luxembourg, Luxembourg, Renzo Degiovanni SnT, University of Luxembourg, Matthieu Jimenez SnT, University of Luxembourg, Maxime Cordy University of Luxembourg, Luxembourg, Mike Papadakis University of Luxembourg, Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg Link to publication DOI Authorizer link Pre-print Media Attached
15:07 7m Talk		Do I really need all this work to find vulnerabilities? An empirical case study comparing vulnerability detection techniques on a Java application Journal-First Papers Sarah Elder North Carolina State University, Nusrat Zahan North Carolina State University, Rui Shu North Carolina State University, Valeri Kozarev North Carolina State University, Tim Menzies North Carolina State University, Laurie Williams North Carolina State University