Noisy Label Learning for Security Defects (MSR 2022 - Technical Papers)

Who

Roland Croft, Muhammad Ali Babar, Huaming Chen

Track

MSR 2022 Technical Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 19 May 2022 05:11 - 05:18 at MSR Main room - odd hours - Session 10: Security Chair(s): Triet Le

Abstract

Data-driven software engineering processes, such as vulnerability prediction heavily rely on the quality of the data used. In this paper, we observe that noise-free security defect datasets are infeasible to be obtained in practice. Despite the vulnerable class, the non-vulnerable modules are difficult to be verified and determined as truly exploit free given the limited manual efforts available. It results in uncertainty, introducing labeling noise in the datasets and affecting conclusion validity. To address this issue, we propose novel learning methods that are robust to label impurities and can leverage the most from limited label data; noisy label learning. We investigate various noisy label learning methods applied to software vulnerability prediction. Specifically, we propose a two-stage learning method based on noise cleaning to identify and remediate the noisy samples, which improves AUC and recall of baselines by up to 8.9% and 23.4%, respectively. Moreover, we discuss several hurdles in terms of achieving a performance upper bound with semi-omniscient knowledge of the label noise. Overall, the experimental results show that learning from noisy labels can be effective for data-driven software and security analytics.

Roland Croft

The University of Adelaide

Australia

Muhammad Ali Babar

University of Adelaide

Australia

Huaming Chen

The University of Adelaide

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 19 May
Displayed time zone: Eastern Time (US & Canada) change

05:00 - 05:50	Session 10: SecurityTechnical Papers / Data and Tool Showcase Track / Registered Reports at MSR Main room - odd hours Chair(s): Triet Le The University of Adelaide

05:00 4m Short-paper		WeakSATD: detecting weak self-admitted technical debt Technical Papers Barbara Russo Free University of Bolzano, Matteo Camilli Free University of Bozen-Bolzano, Moritz Mock Free University of Bolzano DOI Pre-print Media Attached
05:04 7m Talk		LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries Technical Papers Wei Tang Tsinghua University, Yanlin Wang Microsoft Research, Hongyu Zhang University of Newcastle, Shi Han Microsoft Research, Ping Luo Tsinghua University, Dongmei Zhang Microsoft Research Pre-print
05:11 7m Talk		Noisy Label Learning for Security Defects Technical Papers Roland Croft The University of Adelaide, Muhammad Ali Babar University of Adelaide, Huaming Chen The University of Adelaide
05:18 4m Talk		Vul4J: A Dataset of Reproducible Java Vulnerabilities Geared Towards the Study of Program Repair TechniquesData and Tool Showcase Award Data and Tool Showcase Track Quang-Cuong Bui Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology, Nicolás E. Díaz Ferreyra Hamburg University of Technology Pre-print Media Attached
05:22 4m Talk		AndroOBFS: Time-tagged Obfuscated Android Malware Dataset with Family Information Data and Tool Showcase Track Saurabh Kumar Indian Institute of Technology Kanpur, Debadatta Mishra , Biswabandan Panda Indian Institute of Technology Bombay, Sandeep K. Shukla Indian Institute of Technology Kanpur DOI Pre-print Media Attached
05:26 4m Talk		TriggerZoo: A Dataset of Android Applications Automatically Infected with Logic Bombs Data and Tool Showcase Track Jordan Samhi University of Luxembourg, Tegawendé F. Bissyandé SnT, University of Luxembourg, Jacques Klein University of Luxembourg DOI Pre-print Media Attached
05:30 4m Talk		CamBench - Cryptographic API Misuse Detection Tool Benchmark Suite Registered Reports Michael Schlichtig Heinz Nixdorf Institute at Paderborn University, Anna-Katharina Wickert TU Darmstadt, Germany, Stefan Krüger Independent Researcher, Eric Bodden University of Paderborn; Fraunhofer IEM, Mira Mezini TU Darmstadt Pre-print
05:34 16m Live Q&A		Discussions and Q&A Technical Papers

Information for Participants

Thu 19 May 2022 05:00 - 05:50 at MSR Main room - odd hours - Session 10: Security Chair(s): Triet Le

Info for room MSR Main room - odd hours:

Click here to go to the room on Midspace