Path-Sensitive Code Embedding via Contrastive Learning for Software Vulnerability Detection (ISSTA 2022 - Technical Papers)

Who

Xiao Cheng, Guanqin Zhang, Haoyu Wang, Yulei Sui

Track

ISSTA 2022 Technical Papers

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 20 Jul 2022 17:20 - 17:40 at ISSTA 1 - Session 3-1: Static Analysis and Specifications Testing C Chair(s): Ding Li
Thu 21 Jul 2022 08:40 - 09:00 at ISSTA 2 - Session 2-10: Static Analysis and Specifications Testing B Chair(s): Behnaz Hassanshahi

Abstract

Machine learning and its promising branch deep learning have shown success in a wide range of application domains. Recently, much effort has been depended on applying deep learning techniques (e.g., graph neural networks) to static vulnerability detection as an alternative to conventional bug detection methods. To obtain structural information of code, current learning approaches typically abstract a program in the form of graphs (e.g., data-flow graphs, abstract syntax trees), and then train an underlying classification model based on the (sub)graphs of safe and vulnerable code fragments for vulnerability prediction. However, these models are still insufficient for precise bug detection, because the objective of these models is to produce classification results rather than comprehending the semantics of vulnerabilities, e.g., pinpoint bug triggering paths, which are essential for static bug detection.

This paper presents ContraFlow, a selective yet precise contrastive value-flow embedding approach for statically detecting vulnerabilities. The novelty of ContraFlow lies in selecting and preserving feasible value-flow (aka program dependence) paths through a pretrained path embedding model using self-supervised contrastive learning, thus significantly reducing the amount of labeled data required for training expensive downstream models for path-based vulnerability detection. We have evaluated ContraFlow using 288 real-world projects by comparing with eight recent learning-based approaches. ContraFlow outperforms these eight baselines by up to 334.1%, 317.9%, 58.3% for informedness, markedness and F1 Score, and ContraFlow achieves up to 450.0%, 192.3%, 450.0% improvement for mean statement recall, mean statement precision and mean IoU respectively in terms of locating buggy statements.

DOI

https://doi.org/10.1145/3533767.3534371

Xiao Cheng

University of Technology Sydney

Guanqin Zhang

University of Technology Sydney

Haoyu Wang

Huazhong University of Science and Technology, China

China

Yulei Sui

University of New South Wales

Australia