EOP: Efficient Operator Partition for Deep Learning Inference Over Edge Servers (VEE 2022 - Research Papers)

Who

Yuanjia XU, Heng WU, Wenbo ZHANG, Yi HU

Track

VEE 2022 Research Papers

Time Zone

The program is currently displayed in (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 1 Mar 2022 11:15 - 11:35 at Online - Session-1: System Virtualization Chair(s): Antonio Barbalace

Abstract

Recently, Deep Learning (DL) models have demonstrated great success for its attractive ability of high accuracy used in artificial intelligence Internet of Things applications. A common deployment solution is to run such DL inference tasks on edge servers. In a DL inference, each operator takes tensors as input and run in a tensor virtual machine, which isolates resource usage among operators. Nevertheless, existing edge-based DL inference approaches can not efficiently use heterogeneous resources (e.g., CPU and low-end GPU) on edge servers and result in sub-optimal DL inference performance, since they can only partition operators in a DL inference with equal or fixed ratios. It is still a big challenge to support partition optimizations over edge servers for a wide range of DL models, such as Convolution Neural Network (CNN), Recurrent Neural Network (RNN) and Transformers. In this paper, we present EOP, an Efficient Operator Partition approach to optimize DL inferences over edge servers, to address this challenge. Firstly, we carry out a large-scale performance evaluation on operators running on heterogeneous resources, and reveal that many operators do not follow similar performance variation when input tensors change. Further, we employ three categorized patterns to estimate the performance of operators, and then efficiently partition key operators and tune partition ratios. Finally, we implement EOP on TVM, and experiments over a typical edge server show that EOP improves the inference performance by up to $1.25-1.97\times$ for various DL models compared to state-of-the-art approaches.

Yuanjia XU

University of Chinese Academy of Sciences; Institute of Software, Chinese Academy of Sciences

China

Heng WU

Institute of Software, Chinese Academy of Sciences

China

Wenbo ZHANG

Institute of Software, Chinese Academy of Sciences; State Key Laboratory of Computer Sciences, Institute of Software, Chinese Academy of Sciences

Yi HU

University of Chinese Academy of Sciences; Institute of Software, Chinese Academy of Sciences

Time Zone

The program is currently displayed in (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 1 Mar
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:15 - 11:35	Session-1: System VirtualizationResearch Papers at Online Chair(s): Antonio Barbalace The University of Edinburgh

10:15 20m Talk		Portkey: Hypervisor-assisted container migration in nested cloud environments Research Papers Chandra Prakash Indian Institute of Technology Bombay, Debadatta Mishra , Purushottam Kulkarni Indian Institute of Technology, Bombay, Umesh Bellur IIT Bombay
10:35 20m Talk		Container-aware I/O Stack: Bridging the Gap between Container Storage Drivers and Solid State Devices Research Papers Song Wu Huazhong University of Science and Technology, China, Zhuo Huang Huazhong University of Science and Technology, Pengfei Chen Huazhong University of Science and Technology, Hao Fan Huazhong University of Science and Technology, Shadi Ibrahim Inria, Hai Jin Huazhong University of Science and Technology
10:55 20m Talk		ClusterRR: A Record and Replay Framework for Virtual Machine Cluster Research Papers Wei Wang Institute of Information Engineering, School of Cyber Security, University of Chinese Academy of Sciences, Zhiyu Hao Institute of Information Engineering, Chinese Academy of Sciences, Lei Cui Institute of Information Engineering，Chinese Academy of Sciences
11:15 20m Talk		EOP: Efficient Operator Partition for Deep Learning Inference Over Edge Servers Research Papers Yuanjia XU University of Chinese Academy of Sciences; Institute of Software, Chinese Academy of Sciences, Heng WU Institute of Software, Chinese Academy of Sciences, Wenbo ZHANG Institute of Software, Chinese Academy of Sciences; State Key Laboratory of Computer Sciences, Institute of Software, Chinese Academy of Sciences, Yi HU University of Chinese Academy of Sciences; Institute of Software, Chinese Academy of Sciences

Information for Participants

Tue 1 Mar 2022 10:15 - 11:35 at Online - Session-1: System Virtualization Chair(s): Antonio Barbalace

Info for session

The Zoom room for Session 1 is at https://rochester.zoom.us/j/98375917164?pwd=ZHRvcy85elRVUWtDaGRZQkl6dENTQT09.