Write a Blog >>
ICSE 2023
Sun 14 - Sat 20 May 2023 Melbourne, Australia

This artifact contains functional and reusable versions of VULGEN, and the respective datasets and the downstream tasks for the evaluation. As the tool setup is complex, we have prepared a Docker image on Zenodo [1] that has already contained necessary components to execute the tools and prompts to help users compare the respective experiment results in the paper [2]. In addition, we also provide an independent version of the artifact on Zenodo for the users who do not want to use Docker. As our experiments involve a large amount of data, it is recommended that the host machine has at least 200GB hard disk space, 32GB memory, and NVIDIA GPUs that CUDA 11.1 supports.

The experiments presented in this artifact can be divided into 3 different parts. The first part is an evaluation of VULGEN and the baselines for generating software vulnerabilities which is presented in Table 1 as found in our research paper. In this part, VULGEN and the baseline techniques are evaluated on a real-world dataset for vulnerability generation. The second part evaluates whether the generated realistic vulnerabilities from VULGEN can indeed improve the chosen deep learning-based vulnerability detectors, which is presented in Table 2 as found in our research paper. In this part, two state-of-the-art vulnerability detectors, Devign and ReVeal, are used to test whether the vulnerability datasets with the added generated realistic vulnerabilities can train the detectors better. The third part is a demo dataset and the respective processing code for demonstrating how VULGEN can be reused. In this part, users can follow the scripts we provided to train and test VULGEN using a different dataset.

As VULGEN is a combination of probabilistic (deep learning-based) and deterministic (pattern-based) models, it takes considerable amount of time (e.g., several days) and hardware resources (e.g., >=32GB CPU memory and >= 24GB GPU memory) to run the experiments. Thus, to make it easier for users to reproduce the experiments, we have saved the trained models (including those deep learning-based and pattern-based) for reproducing the vulnerability generation experiments of VULGEN. Note that our study also involves manual reviews and user studies. The raw data and results are also provided in our artifact. To allow researchers to run VULGEN as a reusable tool, we also provide a demo dataset to show how to reuse VULGEN.

In this artifact evaluation, we apply for the “Available”, “Functional”, and “Reusable” badges as our expectations, as we make our code and data publicly available on Zenodo and build our artifact into a Docker image that users can reproduce our experiments easily. We also provide a demo dataset and respective scripts for reusing VulGen on other datasets.

The artifact only needs the reviewers to know how to use common Linux commands and Docker containers.

[1] https://zenodo.org/record/7569854 [2] https://drive.google.com/file/d/1nDqZ4d_LWzDACSZ_NL6uR2bJ1F9w3VrU/view?usp=sharing