Many Models at the Edge: Characterizing and Improving Deep Inference via Model-Level Caching
Deep learning models are rapidly expanding in popularity in large part due to rapid innovations in model usage and accuracy, as well as companies’ enthusiasm in integrating deep learning into existing application logic. This trend will inevitably lead to a deployment scenario, akin to the content delivery network for web objects, where many deep learning models—each with different popularity—run on a shared edge with limited resources. In this paper, we set out to answer the key question of how to effectively manage many deep learning models at the edge. Via an empirical study based on profiling more than twenty deep learning models and extrapolating from an open-source Azure workload trace, we pinpoint a promising avenue of leveraging cheaper CPU, rather than commonly promoted accelerators, and focusing on managing the bottleneck resource, which we show to be memory.
Based on our empirical insights, we formulate the DL model management problem as a classical caching problem, which we refer to as model-level caching. As an initial step towards realizing model-level caching, we propose a simple cache eviction policy, calledBeladyAM, by adapting BeladyMIN to explicitly consider DL model-specific factors when calculating each in-cache object’s utility. We demonstrate that in a testbed we can achieve a 50% reduction in memory while keeping load latency below 92% of execution latency, less than 36% of the cost of using a random approach to model eviction. Further,when scaling to more models and requests in a simulation, we demonstrate we can keep the model load delay lower than other eviction policies by up to 16.6% that only consider workload characteristics
Wed 29 SepDisplayed time zone: Eastern Time (US & Canada) change
11:45 - 12:50
|FaaSRank: Learning to Schedule Functions in Serverless Platforms|
|Many Models at the Edge: Characterizing and Improving Deep Inference via Model-Level Caching|
|Empirical Characterization of User Reports About Cloud Failures|