Automatically Resolving Data Source Dependency Hell in Large Scale Data Science Projects
Dependency hell is a well-known pain point in the development of large software projects and machine learning (ML) code bases are not immune from it. In fact, ML applications suffer from an additional form of dependency hell, namely, data source dependency hell. This term refers to the central role played by data and its unique quirks that often lead to unexpected failures of ML models which cannot be explained by code changes. In this paper, we present an automated data source dependency mapping framework that allows MLOps engineers to monitor the whole dependency map of their models in a fast paced engineering environment and thus mitigate ahead of time the consequences of any data source changes. Our system is based on a unified and generic approach, employing techniques from static analysis, from which data sources can be identified on a wide range of source artefacts. Our framework is currently deployed within Microsoft and used by Microsoft MLOps engineers in production.
Mon 15 MayDisplayed time zone: Hobart change
17:15 - 18:45 | Data & Model OptimizationPapers / Posters / Industrial Talks at Virtual - Zoom for CAIN Chair(s): Justus Bogner University of Stuttgart Click here to Join us over zoomClick here to watch the session recording on Youtube | ||
17:15 15mShort-paper | Automatically Resolving Data Source Dependency Hell in Large Scale Data Science Projects Papers Pre-print | ||
17:30 15mShort-paper | Dataflow graphs as complete causal graphs Papers Andrei Paleyes Department of Computer Science and Technology, Univesity of Cambridge, Siyuan Guo Max Planck Institute for Intelligent Systems, Bernhard Schölkopf MPI Tuebingen, Neil D. Lawrence Department of Computer Science and Technology, Univesity of Cambridge Pre-print | ||
17:45 20mLong-paper | Uncovering Energy-Efficient Practices in Deep Learning Training: Preliminary Steps Towards Green AIDistinguished paper Award Candidate Papers Tim Yarally Delft University of Technology, Luís Cruz Delft University of Technology, Daniel Feitosa University of Groningen, June Sallou Delft University of Technology, Arie van Deursen Delft University of Technology Pre-print | ||
18:05 15mShort-paper | Prevalence of Code Smells in Reinforcement Learning Projects Papers Nicolás Cardozo Universidad de los Andes, Ivana Dusparic Trinity College Dublin, Ireland, Christian Cabrera Department of Computer Science and Technology, Univesity of Cambridge Pre-print Media Attached | ||
18:20 20mLong-paper | Automotive Perception Software Development: An Empirical Investigation into Data, Annotation, and Ecosystem Challenges Papers Hans-Martin Heyn University of Gothenburg & Chalmers University of Technology, Khan Mohammad Habibullah University of Gothenburg, Eric Knauss Chalmers | University of Gothenburg, Jennifer Horkoff Chalmers and the University of Gothenburg, Markus Borg CodeScene, Alessia Knauss Zenseact AB, Polly Jing Li Kognic AB Pre-print |