Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language Models (ASE 2024 - Research Papers)

Who

Yulun Wu, Ming Wen, Zeliang Yu, Xiaochen Guo, Hai Jin

Track

ASE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 31 Oct 2024 11:00 - 11:15 at Magnoila - Vulnerability and security2 Chair(s): Yiming Tang

Abstract

Open-source software (OSS) has profoundly transformed the software development paradigm by facilitating effortless code reuse. However, in recent years, there has been an alarming increase in disclosed vulnerabilities within OSS, posing significant security risks to downstream users. Therefore, analyzing existing vulnerabilities and precisely assessing their threats to downstream applications become pivotal. Plenty of efforts have been made recently towards this problem, such as vulnerability reachability analysis and vulnerability reproduction. The key to these tasks is identifying the vulnerable function (i.e., the function where the root cause of a vulnerability resides). However, public vulnerability datasets (e.g., NVD) rarely include this information as pinpointing the exact vulnerable functions remains to be a longstanding challenge.

Existing methods mainly detect vulnerable functions based on vulnerability patches or Proof-of-Concept (PoC). However, such methods face significant limitations due to data availability and the requirement for extensive manual efforts, thus hindering scalability. To address this issue, we propose a novel approach VFFinder that localizes vulnerable functions based on Common Vulnerabilities and Exposures (CVE) descriptions and the corresponding source code utilizing Large Language Models (LLMs). Specifically, VFFinder adopts a customized in-context learning (ICL) approach based on CVE description patterns to enable LLM to extract key entities. It then performs priority matching with the source code to localize vulnerable functions. We assess the performance of VFFinder on 75 large open-source projects. The results demonstrate that VFFinder surpasses existing baselines significantly. Notably, the Top-1 and MRR metrics have been improved substantially, averaging 4.25X and 2.37X respectively. We also integrate VFFinder with Software Composition Analysis (SCA) tools, and the results show that our tool can reduce the false positive rates of existing SCA tools significantly.

Yulun Wu

Huazhong University of Science and Technology

Ming Wen

Huazhong University of Science and Technology

China

Zeliang Yu

Huazhong University of Science and Technology

China

Xiaochen Guo

Huazhong University of Science and Technology

China

Hai Jin

Huazhong University of Science and Technology

China

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 31 Oct
Displayed time zone: Pacific Time (US & Canada) change

10:30 - 12:00	Vulnerability and security2NIER Track / Research Papers / Tool Demonstrations at Magnoila Chair(s): Yiming Tang Rochester Institute of Technology

10:30 15m Talk		Coding-PTMs: How to Find Optimal Code Pre-trained Models for Code Embedding in Vulnerability Detection? Research Papers Yu Zhao , Lina Gong Nanjing University of Aeronautics and Astronautic, Zhiqiu Huang Nanjing University of Aeronautics and Astronautics, Yongwei Wang Shanghai Institute for Advanced Study and College of Computer Science, Zhejiang University, Mingqiang Wei School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Fei Wu College of Computer Science and Technology in Zhejiang University
10:45 15m Talk		STASE: Static Analysis Guided Symbolic Execution for UEFI Vulnerability Signature Generation Research Papers Md Shafiuzzaman University of California at Santa Barbara, Achintya Desai University of California Santa Barbara, Laboni Sarker University of California at Santa Barbara, Tevfik Bultan University of California at Santa Barbara
11:00 15m Talk		Effective Vulnerable Function Identification based on CVE Description Empowered by Large Language Models Research Papers Yulun Wu Huazhong University of Science and Technology, Ming Wen Huazhong University of Science and Technology, Zeliang Yu Huazhong University of Science and Technology, Xiaochen Guo Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology
11:15 15m Talk		COBRA: Interaction-Aware Bytecode-Level Vulnerability Detector for Smart Contracts Research Papers Wenkai Li Hainan University, Xiaoqi Li Hainan University, Zongwei Li Hainan University, Yuqing Zhang University of Chinese Academy of Sciences; Zhongguancun Laboratory Link to publication DOI Pre-print Media Attached
11:30 10m Talk		MADE-WIC: Multiple Annotated Datasets for Exploring Weaknesses In Code Tool Demonstrations Moritz Mock Free University of Bozen-Bolzano, Jorge Melegati Free University of Bozen-Bolzano, Max Kretschmann Hamburg University of Technology, Nicolás E. Díaz Ferreyra Hamburg University of Technology, Barbara Russo Free University of Bozen/Bolzano, Italy DOI Pre-print
11:40 10m Talk		The Software Genome Project: Unraveling Software Through Genetic Principles NIER Track Yueming Wu Nanyang Technological University, Chengwei Liu Nanyang Technological University, Zhengzi Xu Nanyang Technological University; Imperial Global Singapore, Lyuye Zhang Nanyang Technological University, Yiran Zhang , Zhiling Zhu Zhejiang University of Technology, Yang Liu Nanyang Technological University
11:50 10m Talk		Mining for Mutation Operators for Reduction of Information Flow Control Violations NIER Track Ilya Kosorukov University College London, Daniel Blackwell University College London, David Clark University College London, Myra Cohen Iowa State University, Justyna Petke University College London