Path-Sensitive Code Embedding via Contrastive Learning for Software Vulnerability Detection
Thu 21 Jul 2022 08:40 - 09:00 at ISSTA 2 - Session 2-10: Static Analysis and Specifications Testing B Chair(s): Behnaz Hassanshahi
Machine learning and its promising branch deep learning have shown success in a wide range of application domains. Recently, much effort has been depended on applying deep learning techniques (e.g., graph neural networks) to static vulnerability detection as an alternative to conventional bug detection methods. To obtain structural information of code, current learning approaches typically abstract a program in the form of graphs (e.g., data-flow graphs, abstract syntax trees), and then train an underlying classification model based on the (sub)graphs of safe and vulnerable code fragments for vulnerability prediction. However, these models are still insufficient for precise bug detection, because the objective of these models is to produce classification results rather than comprehending the semantics of vulnerabilities, e.g., pinpoint bug triggering paths, which are essential for static bug detection.
This paper presents ContraFlow, a selective yet precise contrastive value-flow embedding approach for statically detecting vulnerabilities. The novelty of ContraFlow lies in selecting and preserving feasible value-flow (aka program dependence) paths through a pretrained path embedding model using self-supervised contrastive learning, thus significantly reducing the amount of labeled data required for training expensive downstream models for path-based vulnerability detection. We have evaluated ContraFlow using 288 real-world projects by comparing with eight recent learning-based approaches. ContraFlow outperforms these eight baselines by up to 334.1%, 317.9%, 58.3% for informedness, markedness and F1 Score, and ContraFlow achieves up to 450.0%, 192.3%, 450.0% improvement for mean statement recall, mean statement precision and mean IoU respectively in terms of locating buggy statements.
Wed 20 JulDisplayed time zone: Seoul change
16:20 - 17:40 | Session 3-1: Static Analysis and Specifications Testing CTechnical Papers at ISSTA 1 Chair(s): Ding Li Peking University | ||
16:20 20mTalk | A Large-scale Study of Usability Criteria addressed by Static Analysis Tools Technical Papers Marcus Nachtigall Heinz Nixdorf Institute, Paderborn University, Michael Schlichtig Heinz Nixdorf Institute, Paderborn University, Eric Bodden University of Paderborn; Fraunhofer IEM DOI | ||
16:40 20mTalk | An Empirical Study on the Effectiveness of Static C/C++ Analyzers for Vulnerability Detection Technical Papers Stephan Lipp Technical University of Munich, Sebastian Banescu Technical University of Munich, Alexander Pretschner TU Munich DOI Pre-print | ||
17:00 20mTalk | Combining Static Analysis Error Traces with Dynamic Symbolic Execution (Experience Paper) Technical Papers Frank Busse Imperial College London, Pritam Gharat Imperial College London, Cristian Cadar Imperial College London, UK, Alastair F. Donaldson Imperial College London DOI Pre-print | ||
17:20 20mTalk | Path-Sensitive Code Embedding via Contrastive Learning for Software Vulnerability Detection Technical Papers Xiao Cheng University of Technology Sydney, Guanqin Zhang University of Technology Sydney, Haoyu Wang Huazhong University of Science and Technology, China, Yulei Sui University of New South Wales DOI |
Thu 21 JulDisplayed time zone: Seoul change
08:40 - 09:40 | Session 2-10: Static Analysis and Specifications Testing BTechnical Papers at ISSTA 2 Chair(s): Behnaz Hassanshahi Oracle Labs, Australia | ||
08:40 20mTalk | Path-Sensitive Code Embedding via Contrastive Learning for Software Vulnerability Detection Technical Papers Xiao Cheng University of Technology Sydney, Guanqin Zhang University of Technology Sydney, Haoyu Wang Huazhong University of Science and Technology, China, Yulei Sui University of New South Wales DOI | ||
09:00 20mTalk | Testing Dafny (Experience Paper) Technical Papers Ahmed Irfan Amazon Web Services, Sorawee Porncharoenwase University of Washington, Zvonimir Rakamaric Amazon Web Services, Neha Rungta Amazon Web Services, Emina Torlak Amazon Web Services DOI | ||
09:20 20mTalk | The Raise of Machine Learning Hyperparameter Constraints in Python CodeACM SIGSOFT Distinguished Paper Technical Papers Ingkarat Rak-amnouykit Rensselaer Polytechnic Institute, Ana Milanova Rensselaer Polytechnic Institute, Guillaume Baudart Inria; ENS; PSL University, Martin Hirzel IBM Research, Julian Dolby IBM Research, USA DOI |