Automated Identification of Libraries from Vulnerability Data: Can We Do Better? (ICPC 2022 - Research)

Who

Stefanus Agus Haryono, Hong Jin Kang, Abhishek Sharma , Asankhaya Sharma, Andrew Santosa, Ang Ming Yi, David Lo

Track

ICPC 2022 Research

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 16 May 2022 08:00 - 08:07 at ICPC room - Session 5: Security Chair(s): Na Meng

Abstract

Software engineers depend heavily on software libraries and have to update their dependencies once vulnerabilities are found in them. Software Composition Analysis (SCA) helps developers identify vulnerable libraries used by an application. A key challenge is the identification of libraries related to a given reported vulnerability in the National Vulnerability Database (NVD), which may not explicitly indicate the affected libraries. Recently, researchers have tried to address the problem of identifying the libraries from an NVD report by treating it as an extreme multi-label learning ML) problem, characterized by its large number of possible labels and severe data sparsity. As input, the NVD report is provided, and as output, a set of relevant libraries is returned.

In this work, we evaluated multiple XML techniques and performed an analysis of different models proposed for XML classification. While previous work only evaluated a traditional XML technique, FastXML, we trained four other traditional XML models (DiSMEC, Parabel, Bonsai, ExtremeText) as well as two deep learning-based models (XML-CNN and LightXML). We compared the performance in both their effectiveness and the time cost of training and using the models for predictions. We find that other than DiSMEC and XML-CNN, recent XML models outperform the FastXML model by 3%–10% in terms of F1-scores on Top-k (k=1,2,3) predictions. Furthermore, we observe significant improvements in both the training and prediction time of these XML models, with Bonsai and Parabel model achieving 627x and 589x faster training time and 12x faster prediction time from the FastXML baseline. From a deeper analysis, we discuss the implications of our experimental results and highlight limitations that future work needs to address.

Link to Preprint

https://kanghj.github.io/publications/xml_how_far_icpc_2022.pdf

Stefanus Agus Haryono

Singapore Management University

Hong Jin Kang

Singapore Management University

Abhishek Sharma

Veracode, Inc.

Singapore

Asankhaya Sharma

Veracode, Inc.

Singapore

Andrew Santosa

Veracode, Inc.

Ang Ming Yi

Veracode, Inc.

David Lo

Singapore Management University

Singapore

Media

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 16 May
Displayed time zone: Eastern Time (US & Canada) change

08:00 - 08:30	Session 5: SecurityResearch / Journal First at ICPC room Chair(s): Na Meng Virginia Tech

08:00 7m Talk		Automated Identification of Libraries from Vulnerability Data: Can We Do Better? Research Stefanus Agus Haryono Singapore Management University, Hong Jin Kang Singapore Management University, Abhishek Sharma Veracode, Inc., Asankhaya Sharma Veracode, Inc., Andrew Santosa Veracode, Inc., Ang Ming Yi Veracode, Inc., David Lo Singapore Management University Pre-print Media Attached
08:07 7m Talk		Example-Based Vulnerability Detection and Repair in Java Code Research Ying Zhang Virginia Tech, USA, Ya Xiao Virginia Tech, Md Mahir Asef Kabir Department of Computer Science, Virginia Tech, Daphne Yao Virginia Tech, Na Meng Virginia Tech Media Attached
08:14 7m Talk		Deep security analysis of program code - A systematic literature review Journal First Tim Sonnekalb , Thomas S. Heinze Aarhus University, Denmark, Patrick Mäder Technische Universität Ilmenau Pre-print
08:21 9m Live Q&A		Q&A-Paper Session 5 Research

Information for Participants

Mon 16 May 2022 08:00 - 08:30 at ICPC room - Session 5: Security Chair(s): Na Meng

Info for room ICPC room:

Click here to go to the room on Midspace