ESEIW 2024
Sun 20 - Fri 25 October 2024 Barcelona, Spain
Thu 24 Oct 2024 11:00 - 11:20 at Sala de graus (C4 Building) - Software testing Chair(s): Marco Torchiano

Background: Software Vulnerability (SV) prediction needs large-sized and high-quality data to perform well. Current SV datasets mostly require expensive labeling efforts by experts (human-labeled) and thus are limited in size. Meanwhile, there are growing efforts in automatic SV labeling at scale. However, the fitness of auto-labeled data for SV prediction is still largely unknown. Aims: We quantitatively and qualitatively study the quality and use of the state-of-the-art auto-labeled SV data, D2A, for SV prediction. Method: Using multiple sources and manual validation, we curate clean SV data from human-labeled SV-fixing commits in two well-known projects for investigating the auto-labeled counterparts. Results: We discover that 50+% of the auto-labeled SVs are noisy (incorrectly labeled), and they hardly overlap with the publicly reported ones. Yet, SV prediction models utilizing the noisy auto-labeled SVs can perform up to 22% and 90% better in Matthews Correlation Coefficient and Recall, respectively, than the original models. We also reveal the promises and difficulties of applying noise-reduction methods for automatically addressing the noise in auto-labeled SV data to maximize the data utilization for SV prediction. Conclusions: Our study informs the benefits and challenges of using auto-labeled SVs, paving the way for large-scale SV prediction.

Thu 24 Oct

Displayed time zone: Brussels, Copenhagen, Madrid, Paris change

11:00 - 12:30
11:00
20m
Full-paper
Automatic Data Labeling for Software Vulnerability Prediction Models: How Far Are We?
ESEM Technical Papers
Triet Le The University of Adelaide, Muhammad Ali Babar School of Computer Science, The University of Adelaide
11:20
20m
Full-paper
Contexts Matter: An Empirical Study on Contextual Influence in Fairness Testing for Deep Learning Systems
ESEM Technical Papers
Chengwen Du University of Birmingham, Tao Chen University of Birmingham
11:40
20m
Full-paper
Mitigating Data Imbalance for Software Vulnerability Assessment: Does Data Augmentation Help?
ESEM Technical Papers
Triet Le The University of Adelaide, Muhammad Ali Babar School of Computer Science, The University of Adelaide
12:00
15m
Industry talk
From Literature to Practice: Exploring Fairness Testing Tools for the Software Industry Adoption
ESEM IGC
Thanh Nguyen University of Calgary, Maria Teresa Baldassarre Department of Computer Science, University of Bari , Luiz Fernando de Lima , Ronnie de Souza Santos University of Calgary
Pre-print
12:15
15m
Vision and Emerging Results
Do Developers Use Static Application Security Testing (SAST) Tools Straight Out of the Box? A large-scale Empirical Study
ESEM Emerging Results, Vision and Reflection Papers Track
Gareth Bennett Lancaster University, Tracy Hall Lancaster University, Steve Counsell Brunel University London, Emily Winter Lancaster University, Thomas Shippey LogicMonitor