LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations
Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of generating code snippets from Natural Language (NL) descriptions by learning languages and programming practices from public GitHub repositories. Although LLMs promise an effortless NL-driven deployment of software applications, the security of the code they generate has not been extensively investigated nor documented. In this work, we present \textit{LLMSecEval}, a dataset containing 150 NL prompts that can be leveraged for assessing the security performance of such models. Such prompts are NL descriptions of code snippets prone to various security vulnerabilities listed in MITRE’s top 25 Common Weakness Enumeration (CWE) ranking. Each prompt in our dataset comes with a secure implementation example to facilitate comparative evaluations against code produced by LLMs. As a practical application, we show how \textit{LLMSecEval} can be used for evaluating the security of snippets automatically generated from NL descriptions.
Tue 16 MayDisplayed time zone: Hobart change
14:35 - 15:15 | Defect PredictionData and Tool Showcase Track / Technical Papers at Meeting Room 109 Chair(s): Sarra Habchi Ubisoft | ||
14:35 12mTalk | Large Language Models and Simple, Stupid Bugs Technical Papers Kevin Jesse University of California at Davis, USA, Toufique Ahmed University of California at Davis, Prem Devanbu University of California at Davis, Emily Morgan University of California, Davis Pre-print | ||
14:47 12mTalk | The ABLoTS Approach for Bug Localization: is it replicable and generalizable?Distinguished Paper Award Technical Papers Feifei Niu University of Ottawa, Christoph Mayr-Dorn JOHANNES KEPLER UNIVERSITY LINZ, Wesley Assunção Johannes Kepler University Linz, Austria & Pontifical Catholic University of Rio de Janeiro, Brazil, Liguo Huang Southern Methodist University, Jidong Ge Nanjing University, Bin Luo Nanjing University, Alexander Egyed Johannes Kepler University Linz Pre-print File Attached | ||
14:59 6mTalk | LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations Data and Tool Showcase Track Catherine Tony Hamburg University of Technology, Markus Mutas Hamburg University of Technology, Nicolás E. Díaz Ferreyra Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology Pre-print | ||
15:05 6mTalk | Defectors: A Large, Diverse Python Dataset for Defect Prediction Data and Tool Showcase Track Parvez Mahbub Dalhousie University, Ohiduzzaman Shuvo Dalhousie University, Masud Rahman Dalhousie University Pre-print |