Vulnerability Detection with Code Language Models: How Far Are We?Security
In the context of the rising interest in code language models (code LMs) and vulnerability detection, we study the effectiveness of code LMs for detecting vulnerabilities. Our analysis reveals significant shortcomings in existing vulnerability datasets, including poor data quality, low label accuracy, and high duplication rates, leading to unreliable model performance in realistic vulnerability detection scenarios. Additionally, the evaluation methods used with these datasets are not representative of real-world vulnerability detection.
To address these challenges, we introduce PrimeVul, a new dataset for training and evaluating code LMs for vulnerability detection. PrimeVul incorporates a novel set of data labeling techniques that achieve comparable label accuracy to human-verified benchmarks while significantly expanding the dataset. It also implements a rigorous data de-duplication and chronological data splitting strategy to mitigate data leakage issues, alongside introducing more realistic evaluation metrics and settings. This comprehensive approach aims to provide a more accurate assessment of code LMs’ performance in real-world conditions.
Evaluating code LMs on PrimeVul reveals that existing benchmarks significantly overestimate the performance of these models. For instance, a state-of-the-art 7B model scored 68.26% F1 on BigVul but only 3.09% F1 on PrimeVul. Attempts to improve performance through advanced training techniques and larger models like GPT-3.5 and GPT-4 were unsuccessful, with results akin to random guessing in the most stringent settings. These findings underscore the considerable gap between current capabilities and the practical requirements for deploying code LMs in security roles, highlighting the need for more innovative research in this domain.
Thu 1 MayDisplayed time zone: Eastern Time (US & Canada) change
14:00 - 15:30 | |||
14:00 15mTalk | Large Language Models as Configuration ValidatorsSecurity Research Track Xinyu Lian University of Illinois at Urbana-Champaign, Yinfang Chen University of Illinois at Urbana-Champaign, Runxiang Cheng University of Illinois at Urbana-Champaign, Jie Huang University of Illinois at Urbana-Champaign, Parth Thakkar Meta Platforms, Inc., Minjia Zhang UIUC, Tianyin Xu University of Illinois at Urbana-Champaign | ||
14:15 15mTalk | LLM Assistance for Memory SafetySecurity Research Track Nausheen Mohammed Microsoft Research, Akash Lal Microsoft Research, Aseem Rastogi Microsoft Research, Subhajit Roy IIT Kanpur, Rahul Sharma Microsoft Research | ||
14:30 15mTalk | Vulnerability Detection with Code Language Models: How Far Are We?Security Research Track Yangruibo Ding Columbia University, Yanjun Fu University of Maryland, Omniyyah Ibrahim King Abdulaziz City for Science and Technology, Chawin Sitawarin University of California, Berkeley, Xinyun Chen , Basel Alomair King Abdulaziz City for Science and Technology, David Wagner UC Berkeley, Baishakhi Ray Columbia University, Yizheng Chen University of Maryland | ||
14:45 15mTalk | Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with JustificationsBlockchainSecurity Research Track Wei Ma , Daoyuan Wu Hong Kong University of Science and Technology, Yuqiang Sun Nanyang Technological University, Tianwen Wang National University of Singapore, Shangqing Liu Nanyang Technological University, Jian Zhang Nanyang Technological University, Yue Xue , Yang Liu Nanyang Technological University | ||
15:00 15mTalk | Towards Neural Synthesis for SMT-assisted Proof-Oriented ProgrammingSecurityFormal MethodsAward Winner Research Track Saikat Chakraborty Microsoft Research, Gabriel Ebner Microsoft Research, Siddharth Bhat University of Cambridge, Sarah Fakhoury Microsoft Research, Sakina Fatima University of Ottawa, Shuvendu K. Lahiri Microsoft Research, Nikhil Swamy Microsoft Research | ||
15:15 15mTalk | Prompt-to-SQL Injections in LLM-Integrated Web Applications: Risks and DefensesSecuritySE for AI Research Track Rodrigo Resendes Pedro INESC-ID / IST, Universidade de Lisboa, Miguel E. Coimbra INESC-ID; Instituto Superior Técnico - University of Lisbon, Daniel Castro INESC-ID / IST, Universidade de Lisboa, Paulo Carreira INESC-ID / IST, Universidade de Lisboa, Nuno Santos INESC-ID; Instituto Superior Técnico - University of Lisbon |