Toward Improved Deep Learning-based Vulnerability Detection (ICSE 2024 - Research Track)

Who

Adriana Sejfia, Satyaki Das, Saad Shafiq, Nenad Medvidović

Track

ICSE 2024 Research Track

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 Apr 2024 14:45 - 15:00 at Grande Auditório - Security 1 Chair(s): Letizia Jaccheri

Abstract

Deep learning (DL) based techniques have been a common thread of several recent studies in vulnerability detection. The rise of large, publicly available datasets of vulnerabilities has fueled the learning process behind these techniques. While these datasets help the DL-based vulnerability detectors, they also constrain these detectors’ predicting abilities. Vulnerabilities in these datasets have to be represented in a certain way, e.g., code lines, functions, or Program Dependence Graph (PDG) slices within which the vulnerabilities exist. We refer to this representation as a base unit. This means the detectors learn how base units can be vulnerable and then predict whether other base units are vulnerable. We hypothesized that this focus on individual base units harms the ability of the detectors to properly detect those vulnerabilities that span multiple base units (or MBU vulnerabilities). For vulnerabilities like these, a correct detection occurs when all comprising base units are detected as vulnerable. Verifying how these detectors perform in detecting all parts of a vulnerability is important to establish their effectiveness for other downstream tasks. To check our hypothesis we ran a study focusing on three prominent DL-based detectors, ReVeal, DeepWuKong, and LineVul. Our study shows that all these three DL-based vulnerability detectors contain MBU vulnerabilities in their respective datasets. Further, we observed significant drops of accuracies in detecting these types of vulnerabilities. We present our study and a framework that can be used to help DL-based detectors towards proper inclusion of MBU vulnerabilities.

Link to Preprint

https://arxiv.org/abs/2403.03024

Adriana Sejfia

University of Edinburgh

United Kingdom

Satyaki Das

University of Southern California

Saad Shafiq

University of Southern California

United States

Nenad Medvidović

University of Southern California

United States

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 Apr
Displayed time zone: Lisbon change

14:00 - 15:30	Security 1Research Track / Journal-first Papers at Grande Auditório Chair(s): Letizia Jaccheri Norwegian University of Science and Technology (NTNU)

14:00 15m Talk		Marco: A Stochastic Asynchronous Concolic Explorer Research Track Jie Hu University of California Riverside, Yue Duan Singapore Management University, Heng Yin UC Riverside Pre-print
14:15 15m Talk		Smart Contract and DeFi Security Tools: Do They Meet the Needs of Practitioners? Research Track Stefanos Chaliasos Imperial College London, Marcos Antonios Charalambous Imperial College London, Liyi Zhou Imperial College London, Rafaila Galanopoulou University of Athens, Arthur Gervais Imperial College London, Dimitris Mitropoulos University of Athens, Ben Livshits Imperial College London
14:30 15m Talk		DocFlow: Extracting Taint Specifications from Software Documentation Research Track Marcos Tileria Royal Holloway, University of London, Jorge Blasco Universidad Politécnica de Madrid, Santanu Dash University of Surrey
14:45 15m Talk		Toward Improved Deep Learning-based Vulnerability Detection Research Track Adriana Sejfia University of Edinburgh, Satyaki Das University of Southern California, Saad Shafiq University of Southern California, Nenad Medvidović University of Southern California Pre-print
15:00 15m Talk		Attention! Your Copied Data is Under Monitoring: A Systematic Study of Clipboard Usage in Android Apps Research Track Yongliang Chen City University of Hong Kong, Ruoqin Tang City University of Hong Kong, Chaoshun Zuo Ohio State University, Xiaokuan Zhang George Mason University, Lei Xue Sun Yat-Sen University, Xiapu Luo The Hong Kong Polytechnic University, Qingchuan Zhao City University of Hong Kong
15:15 7m Talk		Evolution of Automated Weakness Detection in Ethereum Bytecode: a Comprehensive Study Journal-first Papers Monika di Angelo TU Wien, Thomas Durieux TU Delft, João F. Ferreira INESC-ID and IST, University of Lisbon, Gernot Salzer TU Wien Link to publication DOI Pre-print File Attached