Productively Deploying Emerging Models on Emerging Platforms: A Top-Down Approach for Testing and Debugging (ISSTA 2025 - Research Papers)

Who

Siyuan Feng, Jiawei Liu, Ruihang Lai, Charlie F. Ruan, Yong Yu, Lingming Zhang, Tianqi Chen

Track

ISSTA 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 25 Jun 2025 14:00 - 14:25 at Aurora A - Evolution, Continuous Integration, and Deployment Chair(s): Laura Plein

Abstract

While existing machine learning (ML) frameworks still focus on established platforms, like running CUDA on server-grade GPUs, there have been growing demands to enable emerging AI applications in a broader set of scenarios, such as running Large Language Models (LLMs) within browsers and mobile phones. However, deploying emerging models on new platforms (such as Metal, Vulkan, and WebGPU) presents significant software engineering challenges due to rapid model evolution and the limited tooling and best practices available for these platforms.

Previous practice for ML model deployment typically follows a bottom-up fashion, where engineers first implement individual required operators and then put them together. However, this traditional development approach fails to meet the productivity requirements when deploying emerging ML systems, with the testing and debugging part as a bottleneck. To this end, we introduce TapML, a top-down approach and tooling designed to streamline the deployment of ML systems on diverse platforms, optimized for developer productivity. While the traditional bottom-up approach requires developers to craft manual tests, TapML automates operator-wise testing using test carving for creating realistic testing data with better quantity and quality. Furthermore, TapML adopts a migration-based strategy to gradually implement and offload model computations from the mature source platform to the target platform, minimizing the debugging scope for compounded bugs to single operators.

We have been practicing TapML to build our real-world framework for deploying emerging models on emerging platforms for more than a year. Through serious deployments of 105 emerging models in 27 distinct model architectures across 5 emerging platforms, we showcase the effectiveness of TapML in enhancing developer productivity while ensuring the quality of deployed models. Furthermore, we summarize comprehensive case studies from our real-world development, offering best practices for developing emerging ML systems.

DOI

https://doi.org/10.1145/3728957

Siyuan Feng

Shanghai Jiao Tong University

Jiawei Liu

University of Illinois at Urbana-Champaign

United States

Ruihang Lai

Carnegie Mellon University

Charlie F. Ruan

Carnegie Mellon University

Yong Yu

Shanghai Jiao Tong University

Lingming Zhang

University of Illinois at Urbana-Champaign

United States

Tianqi Chen

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 25 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:15	Evolution, Continuous Integration, and DeploymentResearch Papers at Aurora A Chair(s): Laura Plein CISPA Helmholtz Center for Information Security

14:00 25m Talk		Productively Deploying Emerging Models on Emerging Platforms: A Top-Down Approach for Testing and Debugging Research Papers Siyuan Feng Shanghai Jiao Tong University, Jiawei Liu University of Illinois at Urbana-Champaign, Ruihang Lai Carnegie Mellon University, Charlie F. Ruan Carnegie Mellon University, Yong Yu Shanghai Jiao Tong University, Lingming Zhang University of Illinois at Urbana-Champaign, Tianqi Chen DOI
14:25 25m Talk		SWE-GPT: A Process-Centric Language Model for Automated Software Improvement Research Papers Yingwei Ma Alibaba Group, Rongyu Cao Tongyi Lab, Alibaba, China, Yongchang Cao Tongyi Lab, Alibaba, China, Yue Zhang Tongyi Lab, Alibaba, China, Jue Chen Tongyi Lab, Alibaba, China, Yibo Liu Tongyi Lab, Alibaba, China, Yuchen Liu Tongyi Lab, Alibaba, China, Binhua Li Tongyi Lab, Alibaba, China, Fei Huang Tongyi Lab, Alibaba, China, Yongbin Li Tongyi Lab, Alibaba, China DOI
14:50 25m Talk		What Happened in This Pipeline? Diffing Build Logs With CiDiff Research Papers Nicolas Hubner University of Bordeaux, LaBRI, UMR 5800, F-33400, Talence, France, Jean-Rémy Falleri Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, Institut Universitaire de France, Raluca Uricaru Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, UMR5800, F-33400 Talence, France, Thomas Degueule CNRS, Thomas Durieux TU Delft DOI

Information for Participants

Wed 25 Jun 2025 14:00 - 15:15 at Aurora A - Evolution, Continuous Integration, and Deployment Chair(s): Laura Plein

Info for room Aurora A:

Aurora A is the first room in the Aurora wing.

When facing the main Cosmos Hall, access to the Aurora wing is on the right, close to the side entrance of the hotel.