Write a Blog >>

Data science pipelines to train and evaluate models with machine learning may contain bugs just like any other code. Leakage between training and test data can lead to overestimating the model’s accuracy during offline evaluations, possibly leading to deployment of low-quality models in production. Such leakage can happen easily by mistake or by following poor practices but may be tedious and challenging to detect manually. We develop a static analysis approach to detect common forms of data leakage in data science code. Our evaluation shows that our analysis accurately detects data leakage and that such leakage is pervasive among over 100,000 analyzed public notebooks. We discuss how our static analysis approach can help both practitioners and educators, and how leakage prevention can be designed into the development process.

Wed 12 Oct

Displayed time zone: Eastern Time (US & Canada) change

13:30 - 15:30
Technical Session 16 - Software VulnerabilitiesResearch Papers / Journal-first Papers at Gold A
Chair(s): Mohamed Wiem Mkaouer Rochester Institute of Technology
13:30
20m
Research paper
Data Leakage in Notebooks: Static Detection and Better Processes
Research Papers
Chenyang Yang , Rachel A Brower-Sinning Carnegie Mellon Software Engineering Institute, Grace Lewis Carnegie Mellon Software Engineering Institute, Christian Kästner Carnegie Mellon University
13:50
20m
Research paper
GLITCH: Automated Polyglot Security Smell Detection in Infrastructure as CodeVirtual
Research Papers
Nuno Saavedra INESC-ID and IST, University of Lisbon, João F. Ferreira INESC-ID and IST, University of Lisbon
Pre-print
14:10
20m
Paper
SafeDrop: Detecting Memory Deallocation Bugs of Rust Programs via Static Data-Flow AnalysisVirtual
Journal-first Papers
Mohan Cui Fudan University, Chengjun Chen Fudan University, Hui Xu Fudan University, Yangfan Zhou Fudan University
14:30
20m
Research paper
Precise (Un)Affected Version Analysis for Web VulnerabilitiesVirtual
Research Papers
ShiYoukun Fudan University, Yuan Zhang Fudan University, Tianhan Luo Fudan University, Xiangyu Mao Fudan University, Min Yang Fudan University
14:50
20m
Research paper
Leveraging Practitioners' Feedback to Improve a Security LinterVirtual
Research Papers
Sofia Reis Instituto Superior Técnico, U. Lisboa & INESC-ID, Rui Abreu Faculty of Engineering, University of Porto, Portugal, Marcelo d'Amorim Federal University of Pernambuco, Daniel Fortunato INESC-ID, University of Porto
15:10
20m
Research paper
Insight: Exploring Cross-Ecosystem Vulnerability ImpactsVirtual
Research Papers
Meiqiu Xu Northeastern University, China, Ying Wang Northeastern University, China, Shing-Chi Cheung Hong Kong University of Science and Technology, Hai Yu Northeastern University, China, Zhiliang Zhu Northeastern University, China