EOP: Efficient Operator Partition for Deep Learning Inference Over Edge Servers
Recently, Deep Learning (DL) models have demonstrated great success for its attractive ability of high accuracy used in artificial intelligence Internet of Things applications. A common deployment solution is to run such DL inference tasks on edge servers. In a DL inference, each operator takes tensors as input and run in a tensor virtual machine, which isolates resource usage among operators. Nevertheless, existing edge-based DL inference approaches can not efficiently use heterogeneous resources (e.g., CPU and low-end GPU) on edge servers and result in sub-optimal DL inference performance, since they can only partition operators in a DL inference with equal or fixed ratios. It is still a big challenge to support partition optimizations over edge servers for a wide range of DL models, such as Convolution Neural Network (CNN), Recurrent Neural Network (RNN) and Transformers. In this paper, we present EOP, an Efficient Operator Partition approach to optimize DL inferences over edge servers, to address this challenge. Firstly, we carry out a large-scale performance evaluation on operators running on heterogeneous resources, and reveal that many operators do not follow similar performance variation when input tensors change. Further, we employ three categorized patterns to estimate the performance of operators, and then efficiently partition key operators and tune partition ratios. Finally, we implement EOP on TVM, and experiments over a typical edge server show that EOP improves the inference performance by up to $1.25-1.97\times$ for various DL models compared to state-of-the-art approaches.
Tue 1 MarDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
10:15 - 11:35 | Session-1: System VirtualizationResearch Papers at Online Chair(s): Antonio Barbalace The University of Edinburgh | ||
10:15 20mTalk | Portkey: Hypervisor-assisted container migration in nested cloud environments Research Papers Chandra Prakash Indian Institute of Technology Bombay, Debadatta Mishra , Purushottam Kulkarni Indian Institute of Technology, Bombay, Umesh Bellur IIT Bombay | ||
10:35 20mTalk | Container-aware I/O Stack: Bridging the Gap between Container Storage Drivers and Solid State Devices Research Papers Song Wu Huazhong University of Science and Technology, China, Zhuo Huang Huazhong University of Science and Technology, Pengfei Chen Huazhong University of Science and Technology, Hao Fan Huazhong University of Science and Technology, Shadi Ibrahim Inria, Hai Jin Huazhong University of Science and Technology | ||
10:55 20mTalk | ClusterRR: A Record and Replay Framework for Virtual Machine Cluster Research Papers | ||
11:15 20mTalk | EOP: Efficient Operator Partition for Deep Learning Inference Over Edge Servers Research Papers Yuanjia XU University of Chinese Academy of Sciences; Institute of Software, Chinese Academy of Sciences, Heng WU Institute of Software, Chinese Academy of Sciences, Wenbo ZHANG Institute of Software, Chinese Academy of Sciences; State Key Laboratory of Computer Sciences, Institute of Software, Chinese Academy of Sciences, Yi HU University of Chinese Academy of Sciences; Institute of Software, Chinese Academy of Sciences |
The Zoom room for Session 1 is at https://rochester.zoom.us/j/98375917164?pwd=ZHRvcy85elRVUWtDaGRZQkl6dENTQT09.