TCSE logo 
 Sigsoft logo
Sustainability badge

This program is tentative and subject to change.

Fri 2 May 2025 14:00 - 14:15 at 210 - Security and Analysis 3 Chair(s): Adriana Sejfia

With the advent of data-centric and machine learning (ML) systems, data quality is playing an increasingly critical role for ensuring the overall quality of software systems. Alas, data preparation, an essential step towards high data quality, is known to be a highly effort-intensive process. Although prior studies have dealt with one of the most impacting issues, data pattern violations, we observe that these studies usually require data-specific configurations (i.e., parameterized) or a certain set of fully curated data as learning examples (i.e., supervised). Both approaches require domain knowledge and depend on users’ deep understanding of their data, and are often effort-intensive. In this paper, we introduce RIOLU: Regex Inferencer autO-parameterized Learning with Uncleaned data. RIOLU is fully automated, is automatically parameterized, and does not need labeled samples. We observe that RIOLU can generate precise patterns from datasets in various domains, with a high F1 score of 97.2%, exceeding the state-of-the-art baseline. In addition, according to our experiment on five datasets with anomalies, RIOLU can automatically estimate a data column’s error rate, draw normal patterns, and predict anomalies from unlabeled data with higher performance (up to 800.4% improvement in terms of F1) than the state-of-the-art baseline. Furthermore, RIOLU can even outperform ChatGPT in terms of both accuracy (12.3% higher F1) and efficiency (10% less inference time). With user involvement, a variation (a guided version) of RIOLU can further boost its precision (up to 37.4% improvement in terms of F1). Our evaluation in an industrial setting further demonstrates the practical benefits of RIOLU.

This program is tentative and subject to change.

Fri 2 May

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
Security and Analysis 3Research Track / SE In Practice (SEIP) at 210
Chair(s): Adriana Sejfia University of Edinburgh
14:00
15m
Talk
Automated, Unsupervised, and Auto-parameterized Inference of Data Patterns and Anomaly DetectionSecurityArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Qiaolin Qin Polytechnique Montréal, Heng Li Polytechnique Montréal, Ettore Merlo Polytechnique Montreal, Maxime Lamothe Polytechnique Montreal
Pre-print
14:15
15m
Talk
On Prescription or Off Prescription? An Empirical Study of Community-prescribed Security Configurations for KubernetesSecurityArtifact-Available
Research Track
Shazibul Islam Shamim Auburn University, Hanyang Hu Company A, Akond Rahman Auburn University
14:30
15m
Talk
Similar but Patched Code Considered Harmful -- The Impact of Similar but Patched Code on Recurring Vulnerability Detection and How to Remove ThemSecurity
Research Track
Zixuan Tan Zhejiang University, Jiayuan Zhou Huawei, Xing Hu Zhejiang University, Shengyi Pan Zhejiang University, Kui Liu Huawei, Xin Xia Huawei
14:45
15m
Talk
TIVER: Identifying Adaptive Versions of C/C++ Third-Party Open-Source Components Using a Code Clustering TechniqueSecurityArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Youngjae Choi Korea University, Seunghoon Woo Korea University
15:00
15m
Talk
A scalable, effective and simple Vulnerability Tracking approach for heterogeneous SAST setups based on Scope+OffsetSecurity
SE In Practice (SEIP)
James Johnson --, Julian Thome GitLab Inc., Lucas Charles GitLab Inc., Hua Yan GitLab Inc., Jason Leasure GitLab Inc.
Pre-print
15:15
15m
Talk
''ImmediateShortTerm3MthsAfterThatLOL'': Developer Secure-Coding Sentiment, Practice and Culture in OrganisationsArtifact-AvailableArtifact-FunctionalArtifact-ReusableSecurity
SE In Practice (SEIP)
Ita Ryan University College Cork, Utz Roedig University College Cork, Klaas-Jan Stol Lero; University College Cork; SINTEF Digital
:
:
:
: