Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection (ICSE 2024 - Research Track) - ICSE 2024

Fri 12 - Sun 21 April 2024 Lisbon, Portugal

Who

Benjamin Steenhoek, Hongyang Gao, Wei Le

Track

ICSE 2024 Research Track

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Wed 17 Apr 2024 16:15 - 16:30 at Luis de Freitas Branco - LLM, NN and other AI technologies 2 Chair(s): Jane Cleland-Huang

Abstract

Deep learning-based vulnerability detection has shown great performance and, in some studies, outperformed static analysis tools. However, the highest-performing approaches use token-based transformer models, which are not the most efficient to capture code semantics required for vulnerability detection. Classical program analysis techniques such as dataflow analysis can detect many types of bugs based on their root causes. In this paper, we propose to combine such causal-based vulnerability detection algorithms with deep learning, aiming to achieve more efficient and effective vulnerability detection. Specifically, we designed DeepDFA, a dataflow analysis-inspired graph learning framework and an embedding technique that enables graph learning to simulate dataflow computation. We show that DeepDFA is both performant and efficient. DeepDFA outperformed all non-transformer baselines. It was trained in 9 minutes, 75x faster than the highest-performing baseline model. When using only 50+ vulnerable and several hundreds of total examples as training data, the model retained the same performance as 100% of the dataset. DeepDFA also generalized to real-world vulnerabilities in DBGBench; it detected 8.7 out of 17 vulnerabilities on average across folds and was able to distinguish between patched and buggy versions, while the highest-performing baseline models did not detect any vulnerabilities. By combining DeepDFA with a large language model, we surpassed the state-of-the-art vulnerability detection performance on the Big-Vul dataset with 96.46 F1 score, 97.82 precision, and 95.14 recall. Our replication package is located at https://figshare.com/s/e7953b4d345b00990d17.

Link to Preprint

https://doi.org/10.48550/arXiv.2212.08108

Benjamin Steenhoek

Iowa State University

United States

Hongyang Gao

Dept. of Computer Science, Iowa State University

Wei Le

Iowa State University

United States

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Wed 17 Apr
Displayed time zone: Lisbon change

	16:00 - 17:30	LLM, NN and other AI technologies 2Journal-first Papers / Software Engineering in Practice / New Ideas and Emerging Results / Research Track / Software Engineering in Society at Luis de Freitas Branco Chair(s): Jane Cleland-Huang University of Notre Dame

	16:00 15m Talk		Large Language Models for Test-Free Fault Localization Research Track Aidan Z.H. Yang Carnegie Mellon University, Claire Le Goues Carnegie Mellon University, Ruben Martins Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University
	16:15 15m Talk		Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection Research Track Benjamin Steenhoek Iowa State University, Hongyang Gao Dept. of Computer Science, Iowa State University, Wei Le Iowa State University Pre-print
	16:30 15m Talk		An Empirical Study on Compliance with Ranking Transparency in the Software Documentation of EU Online Platforms Software Engineering in Society Francesco Sovrano University of Zurich, Michaël Lognoul University of Namur (CRIDS, NADI), Alberto Bacchelli University of Zurich
	16:45 15m Talk		An Industry Case Study on Adoption of AI-based Programming Assistants Software Engineering in Practice Nicole Davila Universidade Federal do Rio Grande do Sul, Igor Wiese Federal University of Technology, Igor Steinmacher Northern Arizona University, Lucas Lucio Federal University of Technology - Paraná (UTFPR), André Kawamoto Federal University of Technology - Paraná (UTFPR), Gilson José Peres Favaro , Ingrid Nunes Universidade Federal do Rio Grande do Sul (UFRGS), Brazil
	17:00 7m Talk		Assessing LLMs for High Stakes Applications Software Engineering in Practice Shannon K. Gallagher Software Engineering Institute, Carnegie Mellon University, Jasmine Ratchford Software Engineering Institute, Carnegie Mellon University, Tyler Brooks Software Engineering Institute, Carnegie Mellon University, Bryan P. Brown Software Engineering Institute, Carnegie Mellon University, Eric Heim Software Engineering Institute, Carnegie Mellon University, William R. Nichols Software Engineering Institute, Carnegie Mellon University, Scott McMillan Software Engineering Institute, Carnegie Mellon University, Swati Rallapalli Software Engineering Institute, Carnegie Mellon University, Carol J. Smith Software Engineering Institute, Carnegie Mellon University, Nathan VanHoudnos Software Engineering Institute, Carnegie Mellon University, Nick Winski Software Engineering Institute, Carnegie Mellon University, Andrew O. Mellinger Software Engineering Institute, Carnegie Mellon University
	17:07 7m Talk		ITG: Trace Generation via Iterative Interaction between LLM Query and Trace Checking New Ideas and Emerging Results Weilin Luo SUN YAT-SEN UNIVERSITY, Weiyuan Fang SUN YAT-SEN UNIVERSITY, Junming Qiu SUN YAT-SEN UNIVERSITY, Hai Wan School of Data and Computer Science, Sun Yat-sen University, Yanan Liu SUN YAT-SEN UNIVERSITY, Rongzhen Ye Sun Yat-Sen University
	17:14 7m Talk		Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural Networks Journal-first Papers NIKITA MEHROTRA Indraprastha Institute of Information Technology, Akash Sharma IIIT-Delhi, Anmol Jindal IIIT-Delhi, Rahul Purandare UNL, USA