Productively Deploying Emerging Models on Emerging Platforms: A Top-Down Approach for Testing and Debugging
This program is tentative and subject to change.
While existing machine learning (ML) frameworks still focus on established platforms, like running CUDA on server-grade GPUs, there have been growing demands to enable emerging AI applications in a broader set of scenarios, such as running Large Language Models (LLMs) within browsers and mobile phones. However, deploying emerging models on new platforms (such as Metal, Vulkan, and WebGPU) presents significant software engineering challenges due to rapid model evolution and the limited tooling and best practices available for these platforms.
Previous practice for ML model deployment typically follows a bottom-up fashion, where engineers first implement individual required operators and then put them together. However, this traditional development approach fails to meet the productivity requirements when deploying emerging ML systems, with the testing and debugging part as a bottleneck. To this end, we introduce TapML, a top-down approach and tooling designed to streamline the deployment of ML systems on diverse platforms, optimized for developer productivity. While the traditional bottom-up approach requires developers to craft manual tests, TapML automates operator-wise testing using test carving for creating realistic testing data with better quantity and quality. Furthermore, TapML adopts a migration-based strategy to gradually implement and offload model computations from the mature source platform to the target platform, minimizing the debugging scope for compounded bugs to single operators.
We have been practicing TapML to build our real-world framework for deploying emerging models on emerging platforms for more than a year. Through serious deployments of 105 emerging models in 27 distinct model architectures across 5 emerging platforms, we showcase the effectiveness of TapML in enhancing developer productivity while ensuring the quality of deployed models. Furthermore, we summarize comprehensive case studies from our real-world development, offering best practices for developing emerging ML systems.
This program is tentative and subject to change.
Wed 25 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
14:00 - 15:15 | |||
14:00 25mTalk | Productively Deploying Emerging Models on Emerging Platforms: A Top-Down Approach for Testing and Debugging Research Papers Siyuan Feng Shanghai Jiao Tong University, Jiawei Liu University of Illinois at Urbana-Champaign, Ruihang Lai Carnegie Mellon University, Charlie F. Ruan Carnegie Mellon University, Yong Yu Shanghai Jiao Tong University, Lingming Zhang University of Illinois at Urbana-Champaign, Tianqi Chen | ||
14:25 25mTalk | SWE-GPT: A Process-Centric Language Model for Automated Software Improvement Research Papers Yingwei Ma Alibaba Group, Rongyu Cao Tongyi Lab, Alibaba, China, Yongchang Cao Tongyi Lab, Alibaba, China, Yue Zhang Tongyi Lab, Alibaba, China, Jue Chen Tongyi Lab, Alibaba, China, Yibo Liu Tongyi Lab, Alibaba, China, Yuchen Liu Tongyi Lab, Alibaba, China, Binhua Li Tongyi Lab, Alibaba, China, Fei Huang Tongyi Lab, Alibaba, China, Yongbin Li Tongyi Lab, Alibaba, China | ||
14:50 25mTalk | What Happened in This Pipeline? Diffing Build Logs With CiDiff Research Papers Nicolas Hubner University of Bordeaux, LaBRI, UMR 5800, F-33400, Talence, France, Jean-Rémy Falleri Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, Institut Universitaire de France, Raluca Uricaru Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, UMR5800, F-33400 Talence, France, Thomas Degueule CNRS, Thomas Durieux TU Delft |