Write a Blog >>
ICSE 2021
Mon 17 May - Sat 5 June 2021

Static analysis tools are widely used for vulnerability detection as they understand programs with complex behavior and millions of lines of code. Despite their popularity, static analysis tools are known to generate an excess of false positives. The recent ability of Machine Learning models to understand programming languages opens new possibilities when applied to static analysis. However, existing datasets to train models for vulnerability identification suffer from multiple limitations such as limited bug context, limited size, and synthetic and unrealistic source code. We propose D2A, a differential analysis based approach to label issues reported by static analysis tools. The D2A dataset is built by analyzing version pairs from multiple open source projects. From each project, we select bug fixing commits and we run static analysis on the versions before and after such commits. If some issues detected in a before-commit version disappear in the corresponding after-commit version, they are very likely to be real bugs that got fixed by the commit. We use D2A to generate a large labeled dataset to train models for vulnerability identification. We show that the dataset can be used to build a classifier to identify possible false alarms among the issues reported by static analysis, hence helping developers prioritize and investigate potential true positives first.

Wed 26 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:30 - 15:30
2.3.1. Defect Prediction: Automation #1Technical Track / SEIP - Software Engineering in Practice at Blended Sessions Room 1 +12h
Chair(s): Carolyn Seaman University of Maryland Baltimore County
14:30
20m
Paper
Automatic Web Testing using Curiosity-Driven Reinforcement LearningTechnical Track
Technical Track
YAN ZHENG Nanyang Technological University, Yi Liu Southern University of Science and Technology, Xiaofei Xie Nanyang Technological University, Yepang Liu Southern University of Science and Technology, China, Lei Ma University of Alberta, Jianye Hao Tianjin University, Yang Liu Nanyang Technological University
Pre-print Media Attached
14:50
20m
Paper
Evaluating SZZ Implementations Through a Developer-informed OracleTechnical Track
Technical Track
Giovanni Rosa University of Molise, Luca Pascarella Delft University of Technology, Simone Scalabrino University of Molise, Rosalia Tufano Università della Svizzera Italiana, Gabriele Bavota Software Institute, USI Università della Svizzera italiana, Michele Lanza Software Institute, USI Università della Svizzera italiana, Rocco Oliveto University of Molise
Pre-print Media Attached
15:10
20m
Paper
D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential AnalysisSEIP
SEIP - Software Engineering in Practice
Yunhui Zheng IBM Research, Saurabh Pujar IBM Research, Burn Lewis IBM Research, Luca Buratti IBM Research, Edward Epstein IBM Research, Bo Yang IBM Research, Jim A. Laredo IBM Research, USA, Alessandro Morari IBM Research, Zhong Su IBM Research
Pre-print Media Attached

Thu 27 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

02:30 - 03:30
02:30
20m
Paper
Automatic Web Testing using Curiosity-Driven Reinforcement LearningTechnical Track
Technical Track
YAN ZHENG Nanyang Technological University, Yi Liu Southern University of Science and Technology, Xiaofei Xie Nanyang Technological University, Yepang Liu Southern University of Science and Technology, China, Lei Ma University of Alberta, Jianye Hao Tianjin University, Yang Liu Nanyang Technological University
Pre-print Media Attached
02:50
20m
Paper
Evaluating SZZ Implementations Through a Developer-informed OracleTechnical Track
Technical Track
Giovanni Rosa University of Molise, Luca Pascarella Delft University of Technology, Simone Scalabrino University of Molise, Rosalia Tufano Università della Svizzera Italiana, Gabriele Bavota Software Institute, USI Università della Svizzera italiana, Michele Lanza Software Institute, USI Università della Svizzera italiana, Rocco Oliveto University of Molise
Pre-print Media Attached
03:10
20m
Paper
D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential AnalysisSEIP
SEIP - Software Engineering in Practice
Yunhui Zheng IBM Research, Saurabh Pujar IBM Research, Burn Lewis IBM Research, Luca Buratti IBM Research, Edward Epstein IBM Research, Bo Yang IBM Research, Jim A. Laredo IBM Research, USA, Alessandro Morari IBM Research, Zhong Su IBM Research
Pre-print Media Attached