Many Models at the Edge: Characterizing and Improving Deep Inference via Model-Level Caching (ACSOS 2021 - Main Track)

Who

Samuel Odgen, Guin Gilman, Robert Walls, Tian Guo

Track

ACSOS 2021 Main Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 29 Sep 2021 12:10 - 12:35 at AUDITORIUM 1 - Resource Management in Data Centers and Cloud Computing I Chair(s): Vana Kalogeraki, Samuel Kounev

Abstract

Deep learning models are rapidly expanding in popularity in large part due to rapid innovations in model usage and accuracy, as well as companies’ enthusiasm in integrating deep learning into existing application logic. This trend will inevitably lead to a deployment scenario, akin to the content delivery network for web objects, where many deep learning models—each with different popularity—run on a shared edge with limited resources. In this paper, we set out to answer the key question of how to effectively manage many deep learning models at the edge. Via an empirical study based on profiling more than twenty deep learning models and extrapolating from an open-source Azure workload trace, we pinpoint a promising avenue of leveraging cheaper CPU, rather than commonly promoted accelerators, and focusing on managing the bottleneck resource, which we show to be memory.

Based on our empirical insights, we formulate the DL model management problem as a classical caching problem, which we refer to as model-level caching. As an initial step towards realizing model-level caching, we propose a simple cache eviction policy, calledBeladyAM, by adapting BeladyMIN to explicitly consider DL model-specific factors when calculating each in-cache object’s utility. We demonstrate that in a testbed we can achieve a 50% reduction in memory while keeping load latency below 92% of execution latency, less than 36% of the cost of using a random approach to model eviction. Further,when scaling to more models and requests in a simulation, we demonstrate we can keep the model load delay lower than other eviction policies by up to 16.6% that only consider workload characteristics

Samuel Odgen

Worcester Polytechnic Institute

United States

Guin Gilman

Worcester Polytechnic Institute

Robert Walls

Worcester Polytechnic Institute

Tian Guo

Worcester Polytechnic Institute

United States

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 29 Sep
Displayed time zone: Eastern Time (US & Canada) change

11:45 - 12:50	Resource Management in Data Centers and Cloud Computing IMain Track at AUDITORIUM 1 Chair(s): Vana Kalogeraki Athens University of Economics and Business, Samuel Kounev University of Würzburg, Germany

11:45 25m Paper		FaaSRank: Learning to Schedule Functions in Serverless Platforms Main Track Hanfei Yu University of Washington, Tacoma, Athirai Irissappane University of Washington, Tacoma, Hao Wang Louisiana State University, USA, Wes Loyd University of Washington, Tacoma
12:10 25m Paper		Many Models at the Edge: Characterizing and Improving Deep Inference via Model-Level Caching Main Track Samuel Odgen Worcester Polytechnic Institute, Guin Gilman Worcester Polytechnic Institute, Robert Walls Worcester Polytechnic Institute, Tian Guo Worcester Polytechnic Institute
12:35 15m Short-paper		Empirical Characterization of User Reports About Cloud Failures Main Track Sacheendra Talluri Vrije Universiteit Amsterdam, Netherlands, Leon Overweel Dexter Energy, Laurens Versluis Vrije Universiteit Amsterdam, Animesh Trivedi Vrije Universiteit Amsterdam, Alexandru Iosup Vrije Universiteit Amsterdam