Deep learning (DL) based techniques have been a common thread of several recent studies in vulnerability detection. The rise of large, publicly available datasets of vulnerabilities has fueled the learning process behind these techniques. While these datasets help the DL-based vulnerability detectors, they also constrain these detectors’ predicting abilities. Vulnerabilities in these datasets have to be represented in a certain way, e.g., code lines, functions, or Program Dependence Graph (PDG) slices within which the vulnerabilities exist. We refer to this representation as a base unit. This means the detectors learn how base units can be vulnerable and then predict whether other base units are vulnerable. We hypothesized that this focus on individual base units harms the ability of the detectors to properly detect those vulnerabilities that span multiple base units (or MBU vulnerabilities). For vulnerabilities like these, a correct detection occurs when all comprising base units are detected as vulnerable. Verifying how these detectors perform in detecting all parts of a vulnerability is important to establish their effectiveness for other downstream tasks. To check our hypothesis we ran a study focusing on three prominent DL-based detectors, ReVeal, DeepWuKong, and LineVul. Our study shows that all these three DL-based vulnerability detectors contain MBU vulnerabilities in their respective datasets. Further, we observed significant drops of accuracies in detecting these types of vulnerabilities. We present our study and a framework that can be used to help DL-based detectors towards proper inclusion of MBU vulnerabilities.
Wed 17 AprDisplayed time zone: Lisbon change
14:00 - 15:30 | Security 1Research Track / Journal-first Papers at Grande Auditório Chair(s): Letizia Jaccheri Norwegian University of Science and Technology (NTNU) | ||
14:00 15mTalk | Marco: A Stochastic Asynchronous Concolic Explorer Research Track Jie Hu University of California Riverside, Yue Duan Singapore Management University, Heng Yin UC Riverside Pre-print | ||
14:15 15mTalk | Smart Contract and DeFi Security Tools: Do They Meet the Needs of Practitioners? Research Track Stefanos Chaliasos Imperial College London, Marcos Antonios Charalambous Imperial College London, Liyi Zhou Imperial College London, Rafaila Galanopoulou University of Athens, Arthur Gervais Imperial College London, Dimitris Mitropoulos University of Athens, Ben Livshits Imperial College London | ||
14:30 15mTalk | DocFlow: Extracting Taint Specifications from Software Documentation Research Track Marcos Tileria Royal Holloway, University of London, Jorge Blasco Universidad Politécnica de Madrid, Santanu Dash University of Surrey | ||
14:45 15mTalk | Toward Improved Deep Learning-based Vulnerability Detection Research Track Adriana Sejfia University of Edinburgh, Satyaki Das University of Southern California, Saad Shafiq University of Southern California, Nenad Medvidović University of Southern California Pre-print | ||
15:00 15mTalk | Attention! Your Copied Data is Under Monitoring: A Systematic Study of Clipboard Usage in Android Apps Research Track Yongliang Chen City University of Hong Kong, Ruoqin Tang City University of Hong Kong, Chaoshun Zuo Ohio State University, Xiaokuan Zhang George Mason University, Lei Xue Sun Yat-Sen University, Xiapu Luo The Hong Kong Polytechnic University, Qingchuan Zhao City University of Hong Kong | ||
15:15 7mTalk | Evolution of Automated Weakness Detection in Ethereum Bytecode: a Comprehensive Study Journal-first Papers Monika di Angelo TU Wien, Thomas Durieux TU Delft, João F. Ferreira INESC-ID and IST, University of Lisbon, Gernot Salzer TU Wien Link to publication DOI Pre-print File Attached |