Learning from What We Know: How to Perform Vulnerability Prediction using Noisy Historical Data
Vulnerability prediction refers to the problem of identifying system components that are most likely to be vulnerable. Typically, this problem is tackled by training binary classifiers on historical data. Unfortunately, recent research has shown that such approaches underperform due to the following two reasons: a) the imbalanced nature of the problem, and b) the inherently noisy historical data, i.e., most vulnerabilities are discovered much later than they are introduced. This misleads classifiers as they learn to recognize actual vulnerable components as non-vulnerable. To tackle these issues, we propose TROVON, a technique that learns from known vulnerable components rather than from vulnerable and non-vulnerable components, as typically performed. We perform this by contrasting the known vulnerable, and their respective fixed components. This way, TROVON manages to learn from the things we know, i.e., vulnerabilities, hence reducing the effects of noisy and unbalanced data. We evaluate TROVON by comparing it with existing techniques on three security-critical open source systems, i.e., Linux Kernel, OpenSSL, and Wireshark, with historical vulnerabilities that have been reported in the National Vulnerability Database (NVD). Our evaluation demonstrates that the prediction capability of TROVON significantly outperforms existing vulnerability prediction techniques such as Software Metrics, Imports, Function Calls, Text Mining, Devign, LSTM, and LSTM-RF with an improvement of 40.84% in Matthews Correlation Coefficient (MCC) score under Clean Training Data Settings, and an improvement of 35.52% under Realistic Training Data Settings.
Fri 19 MayDisplayed time zone: Hobart change
13:45 - 15:15 | Vulnerability detectionTechnical Track / Journal-First Papers at Meeting Room 106 Chair(s): Cuiyun Gao Harbin Institute of Technology | ||
13:45 15mTalk | An Empirical Study of Deep Learning Models for Vulnerability Detection Technical Track Benjamin Steenhoek Iowa State University, Md Mahbubur Rahman Iowa State University, Richard Jiles Iowa State University, Wei Le Iowa State University Pre-print | ||
14:00 15mTalk | DeepVD: Toward Class-Separation Features for Neural Network Vulnerability Detection Technical Track Wenbo Wang New Jersey Institute of Technology, Tien N. Nguyen University of Texas at Dallas, Shaohua Wang New Jersey Institute of Technology, Yi Li New Jersey Institute of Technology, Jiyuan Zhang University of Illinois Urbana-Champaign, Aashish Yadavally The University of Texas at Dallas Pre-print | ||
14:15 15mTalk | Enhancing Deep Learning-based Vulnerability Detection by Building Behavior Graph Model Technical Track Bin Yuan Huazhong University of Science and Technology, Yifan Lu Huazhong University of Science and Technology, Yilin Fang Huazhong University of Science and Technology, Yueming Wu Nanyang Technological University, Deqing Zou Huazhong University of Science and Technology, Zhen Li Huazhong University of Science and Technology, Zhi Li Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology | ||
14:30 15mTalk | Vulnerability Detection with Graph Simplification and Enhanced Graph Representation Learning Technical Track Xin-Cheng Wen Harbin Institute of Technology, Yupan Harbin Institute of Technology, Cuiyun Gao Harbin Institute of Technology, Hongyu Zhang The University of Newcastle, Jie M. Zhang King's College London, Qing Liao Harbin Institute of Technology | ||
14:45 15mTalk | Does data sampling improve deep learning-based vulnerability detection? Yeas! and Nays! Technical Track Xu Yang University of Manitoba, Shaowei Wang University of Manitoba, Yi Li New Jersey Institute of Technology, Shaohua Wang New Jersey Institute of Technology Pre-print | ||
15:00 7mTalk | Learning from What We Know: How to Perform Vulnerability Prediction using Noisy Historical Data Journal-First Papers Aayush Garg University of Luxembourg, Luxembourg, Renzo Degiovanni SnT, University of Luxembourg, Matthieu Jimenez SnT, University of Luxembourg, Maxime Cordy University of Luxembourg, Luxembourg, Mike Papadakis University of Luxembourg, Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg Link to publication DOI Authorizer link Pre-print Media Attached | ||
15:07 7mTalk | Do I really need all this work to find vulnerabilities? An empirical case study comparing vulnerability detection techniques on a Java application Journal-First Papers Sarah Elder North Carolina State University, Nusrat Zahan North Carolina State University, Rui Shu North Carolina State University, Valeri Kozarev North Carolina State University, Tim Menzies North Carolina State University, Laurie Williams North Carolina State University |