LeakageDetector: An Open Source Data Leakage Analysis Tool in Machine Learning Pipelines
Code quality is of paramount importance in all
types of software development settings. Our work seeks to enable
Machine Learning (ML) engineers to write better code by helping
them find and fix instances of Data Leakage in their models. Data
Leakage often results from bad practices in writing ML code.
As a result, the model effectively “memorizes” the data on which
it trains, leading to an overly optimistic estimate of the model
performance and an inability to make generalized predictions.
ML developers must carefully separate their data into training,
evaluation, and test sets to avoid introducing Data Leakage into
their code. Training data should be used to train the model,
evaluation data should be used to repeatedly confirm a model’s
accuracy, and test data should be used only once to determine
the accuracy of a production-ready model. In this paper, we
develop LEAKAGEDETECTOR, a Python plugin for the PyCharm
IDE that identifies instances of Data Leakage in ML code and
provides suggestions on how to remove the leakage.
Thu 6 MarDisplayed time zone: Eastern Time (US & Canada) change
14:00 - 15:30 | Tool Demo and ShowcaseTool Demo Track at M-2401 Chair(s): Brittany Reid Nara Institute of Science and Technology | ||
14:00 7mTalk | AIOpsArena: Scenario-Oriented Evaluation and Leaderboard for AIOps Algorithms in Microservices Tool Demo Track Yongqian Sun Nankai University, Jiaju Wang nankai university, Zhengdan Li Nankai University, Xiaohui Nie Computer Network Information Center at Chinese Academy of Sciences, Minghua Ma Microsoft Research, Shenglin Zhang Nankai University, Yuhe Ji Nankai University, Lu Zhang Peking University, Wen Long Nankai University, Yongnan Luo Nankai University, Hengmao Chen BizSeer, Dan Pei Tsinghua University | ||
14:07 7mTalk | AutoGuard: Reporting breaking changes of REST APIs from Java Spring Boot source code Tool Demo Track Alexander Lercher University of Klagenfurt, Clemens Bauer University of Klagenfurt, Christian Macho University of Klagenfurt, Martin Pinzger Universität Klagenfurt File Attached | ||
14:14 7mTalk | ContractViz: Extending Eclipse Trace Compass for Smart Contract Transaction Analysis Tool Demo Track Xiaolin Liu KTH Royal Institute of Technology, Adel Belkhiri École Polytechnique de Montréal, Mónica Jin KTH Royal Institute of Technology, Yi Li Nanyang Technological University, Cyrille Artho KTH Royal Institute of Technology, Sweden | ||
14:21 7mTalk | DATSO: A Difficulty Assessment Tool for Stack Overflow Questions Tool Demo Track Aman Swaraj Dept. of Computer Science & Engineering, Indian Institute of Technology, Roorkee, India, Neha Gujar Dept. of Computer Science & Engineering, Indian Institute of Technology, Roorkee, India, Manashree Kalode Dept. of Computer Science & Engineering, Indian Institute of Technology, Roorkee, India, Bhoomi Bonal Dept. of Computer Science & Engineering, Indian Institute of Technology, Roorkee, India, Krishna Agarwal Dept. of Computer Science & Engineering, Indian Institute of Technology, Roorkee, India, Sandeep Kumar Dept. of Computer Science & Engineering, Indian Institute of Technology, Roorkee, India | ||
14:28 7mTalk | DragonRadar: Fuzzing Linux Kernel Deployed in Cloud-Native Environment Tool Demo Track Heyuan Shi Central South University, Weibo Zhang Central South University, Runzhe Wang Alibaba Group, Xiaohai Shi Alibaba Group, Guoyu Yin Central South University, Shijun Chen Central South University, Yuhan Chen Central South Sniversity, Qiang Zhang Hunan University, Jianzhong Liu Tsinghua University, Yuheng Shen Tsinghua University | ||
14:35 7mTalk | GHAminer: An Open Source Tool to Extract GitHub Actions Build Metrics Tool Demo Track Jasem Khelifi ETS Montreal, University of Quebec, Yacine Benzina ETS Montreal, University of Quebec, Moataz Chouchen Department of Electrical and Computer Engineering, Concordia University, Montreal, Canada, Ali Ouni ETS Montreal, University of Quebec, Mohammed SAYAGH Queen's University, Salah Bouktif United Arab Emirates University | ||
14:42 7mTalk | IFKG: An Intelligent Fault Diagnosis Tool with Knowledge Graph and Generative LLM Tool Demo Track Xixuan Yang School of Software and Microelectronics, Peking University, Tong Jia Institute for Artificial Intelligence, Peking University, Beijing, China, Ying Li School of Software and Microelectronics, Peking University, Beijing, China, Gang Huang School of Software and Microelectronics, Peking University | ||
14:49 7mTalk | LeakageDetector: An Open Source Data Leakage Analysis Tool in Machine Learning Pipelines Tool Demo Track Eman Abdullah AlOmar Stevens Institute of Technology, USA, Catherine DeMario Stevens Institute of Technology, Roger Shagawat Stevens Institute of Technology, Brandon Kreiser Stevens Institute of Technology | ||
14:56 7mTalk | MDRE-LLM: A Tool for Analysing and Applying LLMs in Software Reverse Engineering Tool Demo Track |