ACSOS 2021
Mon 27 September - Fri 1 October 2021 Washington, DC, United States

Deep learning models are rapidly expanding in popularity in large part due to rapid innovations in model usage and accuracy, as well as companies’ enthusiasm in integrating deep learning into existing application logic. This trend will inevitably lead to a deployment scenario, akin to the content delivery network for web objects, where many deep learning models—each with different popularity—run on a shared edge with limited resources. In this paper, we set out to answer the key question of how to effectively manage many deep learning models at the edge. Via an empirical study based on profiling more than twenty deep learning models and extrapolating from an open-source Azure workload trace, we pinpoint a promising avenue of leveraging cheaper CPU, rather than commonly promoted accelerators, and focusing on managing the bottleneck resource, which we show to be memory.

Based on our empirical insights, we formulate the DL model management problem as a classical caching problem, which we refer to as model-level caching. As an initial step towards realizing model-level caching, we propose a simple cache eviction policy, calledBeladyAM, by adapting BeladyMIN to explicitly consider DL model-specific factors when calculating each in-cache object’s utility. We demonstrate that in a testbed we can achieve a 50% reduction in memory while keeping load latency below 92% of execution latency, less than 36% of the cost of using a random approach to model eviction. Further,when scaling to more models and requests in a simulation, we demonstrate we can keep the model load delay lower than other eviction policies by up to 16.6% that only consider workload characteristics

Wed 29 Sep

Displayed time zone: Eastern Time (US & Canada) change

11:45 - 12:50
Resource Management in Data Centers and Cloud Computing IMain Track at AUDITORIUM 1
Chair(s): Vana Kalogeraki Athens University of Economics and Business, Samuel Kounev University of Würzburg, Germany
11:45
25m
Paper
FaaSRank: Learning to Schedule Functions in Serverless Platforms
Main Track
Hanfei Yu University of Washington, Tacoma, Athirai Irissappane University of Washington, Tacoma, Hao Wang Louisiana State University, USA, Wes Loyd University of Washington, Tacoma
12:10
25m
Paper
Many Models at the Edge: Characterizing and Improving Deep Inference via Model-Level Caching
Main Track
Samuel Odgen Worcester Polytechnic Institute, Guin Gilman Worcester Polytechnic Institute, Robert Walls Worcester Polytechnic Institute, Tian Guo Worcester Polytechnic Institute
12:35
15m
Short-paper
Empirical Characterization of User Reports About Cloud FailuresIEEE ROR-R
Main Track
Sacheendra Talluri Vrije Universiteit Amsterdam, Netherlands, Leon Overweel Dexter Energy, Laurens Versluis Vrije Universiteit Amsterdam, Animesh Trivedi Vrije Universiteit Amsterdam, Alexandru Iosup Vrije Universiteit Amsterdam