Write a Blog >>
MSR 2022
Mon 23 - Tue 24 May 2022
co-located with ICSE 2022
Thu 19 May 2022 05:11 - 05:18 at MSR Main room - odd hours - Session 10: Security Chair(s): Triet Le Huynh Minh

Data-driven software engineering processes, such as vulnerability prediction heavily rely on the quality of the data used. In this paper, we observe that noise-free security defect datasets are infeasible to be obtained in practice. Despite the vulnerable class, the non-vulnerable modules are difficult to be verified and determined as truly exploit free given the limited manual efforts available. It results in uncertainty, introducing labeling noise in the datasets and affecting conclusion validity. To address this issue, we propose novel learning methods that are robust to label impurities and can leverage the most from limited label data; noisy label learning. We investigate various noisy label learning methods applied to software vulnerability prediction. Specifically, we propose a two-stage learning method based on noise cleaning to identify and remediate the noisy samples, which improves AUC and recall of baselines by up to 8.9% and 23.4%, respectively. Moreover, we discuss several hurdles in terms of achieving a performance upper bound with semi-omniscient knowledge of the label noise. Overall, the experimental results show that learning from noisy labels can be effective for data-driven software and security analytics.

Thu 19 May

Displayed time zone: Eastern Time (US & Canada) change

05:00 - 05:50
05:00
4m
Short-paper
WeakSATD: detecting weak self-admitted technical debt
Technical Papers
Barbara Russo Free University of Bolzano, Matteo Camilli Free University of Bozen-Bolzano, Mock Moritz Free University of Bolzano
DOI Pre-print Media Attached
05:04
7m
Talk
LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries
Technical Papers
Wei Tang Tsinghua University, Yanlin Wang Microsoft Research, Hongyu Zhang University of Newcastle, Shi Han Microsoft Research, Ping Luo Tsinghua University, Dongmei Zhang Microsoft Research
Pre-print
05:11
7m
Talk
Noisy Label Learning for Security Defects
Technical Papers
Roland Croft The University of Adelaide, Muhammad Ali Babar University of Adelaide, Huaming Chen The University of Adelaide
05:18
4m
Talk
Vul4J: A Dataset of Reproducible Java Vulnerabilities Geared Towards the Study of Program Repair TechniquesData and Tool Showcase Award
Data and Tool Showcase Track
Quang-Cuong Bui Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology, Nicolás E. Díaz Ferreyra Hamburg University of Technology
Pre-print Media Attached
05:22
4m
Talk
AndroOBFS: Time-tagged Obfuscated Android Malware Dataset with Family Information
Data and Tool Showcase Track
Saurabh Kumar Indian Institute of Technology Kanpur, Debadatta Mishra , Biswabandan Panda Indian Institute of Technology Bombay, Sandeep K. Shukla Indian Institute of Technology Kanpur
DOI Pre-print Media Attached
05:26
4m
Talk
TriggerZoo: A Dataset of Android Applications Automatically Infected with Logic Bombs
Data and Tool Showcase Track
Jordan Samhi University of Luxembourg, Tegawendé F. Bissyandé SnT, University of Luxembourg, Jacques Klein University of Luxembourg
DOI Pre-print Media Attached
05:30
4m
Talk
CamBench - Cryptographic API Misuse Detection Tool Benchmark Suite
Registered Reports
Michael Schlichtig Heinz Nixdorf Institute at Paderborn University, Anna-Katharina Wickert TU Darmstadt, Germany, Stefan Krüger Independent Researcher, Eric Bodden University of Paderborn; Fraunhofer IEM, Mira Mezini TU Darmstadt
Pre-print
05:34
16m
Live Q&A
Discussions and Q&A
Technical Papers


Information for Participants
Thu 19 May 2022 05:00 - 05:50 at MSR Main room - odd hours - Session 10: Security Chair(s): Triet Le Huynh Minh
Info for room MSR Main room - odd hours:

Click here to go to the room on Midspace