Leveraging Large Language Models to Detect npm Malicious Packages
This program is tentative and subject to change.
Existing malicious code detection techniques can aid the manual review process by predicting which packages are likely to be malicious. However, these techniques often suffer from high misclassification rates. Therefore, malicious code detection techniques could be enhanced by adopting advanced, more automated approaches to achieve high accuracy and a low misclassification rate. The goal of this study is to assist security analysts in detecting malicious packages through the empirical study of using Large Language Models (LLMs) to detect malicious code in the npm ecosystem. We present SecurityAI, a malicious code review workflow to detect malicious code using ChatGPT. We leverage a benchmark dataset of 5,115 npm packages, of which 2,180 packages have malicious code. We conducted a baseline comparison of GPT-3 and GPT-4 models with the state-of-the-art CodeQL static analysis tool, using 39 custom CodeQL rules developed in prior research to detect malicious Javascript code. We compare the effectiveness of static analysis as a pre-screener with SecurityAI workflow, measuring the number of files that need to be analyzed and the associated costs. Additionally, we performed a qualitative study to understand the types of malicious packages detected or missed by our workflow. Our baseline comparison demonstrates a 16% and 9% improvement over static analysis in precision and F1 scores, respectively. We attained precision and F1 scores of 91% and 94% for GPT-3, and 99% & 97% for GPT-4, respectively, with GPT-3 offering a cost-effective balance. Pre-screening files with a static analyzer reduces the number of files requiring LLM analysis by 77.9% and decreases costs by 60.9% for GPT-3 and 76.1% for GPT-4. Our qualitative analysis identified data theft, hidden backdoors, and suspicious domain connection categories as the top detected malicious packages. The lack of diversity in model-generated responses led to hallucinations, resulting in misclassification cases, with GPT-3 hallucinating more frequently.
This program is tentative and subject to change.
Fri 2 MayDisplayed time zone: Eastern Time (US & Canada) change
14:00 - 15:30 | |||
14:00 15mTalk | Repository-Level Graph Representation Learning for Enhanced Security Patch Detection Research Track Xin-Cheng Wen Harbin Institute of Technology, Zirui Lin Harbin Institute of Technology, Shenzhen, Cuiyun Gao Harbin Institute of Technology, Hongyu Zhang Chongqing University, Yong Wang Anhui Polytechnic University, Qing Liao Harbin Institute of Technology | ||
14:15 15mTalk | FAMOS: Fault diagnosis for Microservice Systems through Effective Multi-modal Data Fusion Research Track Chiming Duan Peking University, Yong Yang Peking University, Tong Jia Institute for Artificial Intelligence, Peking University, Beijing, China, Guiyang Liu Alibaba, Jinbu Liu Alibaba, Huxing Zhang Alibaba Group, Qi Zhou Alibaba, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Gang Huang Peking University | ||
14:30 15mTalk | Leveraging Large Language Models to Detect npm Malicious Packages Research Track Nusrat Zahan North Carolina State University, Philipp Burckhardt Socket, Inc, Mikola Lysenko Socket, Inc, Feross Aboukhadijeh Socket, Inc, Laurie Williams North Carolina State University | ||
14:45 15mTalk | Magika: AI-Powered Content-Type Detection Research Track Yanick Fratantonio Google, Luca Invernizzi Google, Loua Farah Google, Kurt Thomas Google, Marina Zhang Google, Ange Albertini Google, Francois Galilee Google, Giancarlo Metitieri Google, Julien Cretin Google, Alex Petit-Bianco Google, David Tao Google, Elie Bursztein Google | ||
15:00 15mTalk | Closing the Gap: A User Study on the Real-world Usefulness of AI-powered Vulnerability Detection & Repair in the IDE Research Track Benjamin Steenhoek Microsoft, Siva Sivaraman Microsoft, Renata Saldivar Gonzalez Microsoft, Yevhen Mohylevskyy Microsoft, Roshanak Zilouchian Moghaddam Microsoft, Wei Le Iowa State University | ||
15:15 15mTalk | Show Me Your Code! Kill Code Poisoning: A Lightweight Method Based on Code Naturalness Research Track Weisong Sun Nanjing University, Yuchen Chen Nanjing University, Mengzhe Yuan Nanjing University, Chunrong Fang Nanjing University, Zhenpeng Chen Nanyang Technological University, Chong Wang Nanyang Technological University, Yang Liu Nanyang Technological University, Baowen Xu State Key Laboratory for Novel Software Technology, Nanjing University, Zhenyu Chen Nanjing University Pre-print |