CGO 2023
Sat 25 February - Wed 1 March 2023 Montreal, Canada
Wed 1 Mar 2023 10:52 - 11:18 at Montreal 1-2-3 - Session 7 -- Neural Network Accelerators Chair(s): Lukas Sommer

Neural processing units (NPUs) have become indispensable parts of mobile SoCs. Furthermore, integrating multiple NPU cores into a single chip becomes a promising solution for ever-increasing computing power demands in mobile devices. This paper addresses techniques to maximize the utilization of NPU cores and reduce the latency of on-device inference. Mobile NPUs typically have a small amount of local memory (or scratch pad memory, SPM) that provides space only enough for input/output tensors and weights of one layer operation in deep neural networks (DNNs). Even in multicore NPUs, such local memories are distributed across the cores. In such systems, executing network layer operations in parallel is the primary vehicle to achieve performance. By partitioning a layer of DNNs into multiple sub-layers, we can execute them in parallel on multicore NPUs. Within a core, we can also employ pipelined execution to reduce the execution time of a sub-layer. In this execution model, synchronizing parallel execution and loading/storing intermediate tensors in global memory are the main bottlenecks. To alleviate these problems, we propose novel optimization techniques which carefully consider partitioning direction, execution order, synchronization, and global memory access. Using six popular convolutional neural networks (CNNs), we evaluate our optimization techniques in a flagship mobile SoC with three cores. Compared to the highest-performing partitioning approach, our techniques improve performance by 23%, achieving a speedup of 2.1x over single-core systems.

Wed 1 Mar

Displayed time zone: Eastern Time (US & Canada) change

10:00 - 12:00
Session 7 -- Neural Network AcceleratorsMain Conference at Montreal 1-2-3
Chair(s): Lukas Sommer Codeplay Software
10:00
26m
Talk
Flexer: Out-of-Order Scheduling for Multi-NPUs
Main Conference
Hyemi Min Seoul National University, Jungyoon Kwon Seoul National University, Bernhard Egger Seoul National University
DOI
10:26
26m
Talk
Pin or Fuse? Exploiting Scratchpad Memory to Reduce Off-Chip Data Transfer in DNN Accelerators
Main Conference
Hyuk-Jin Jeong Samsung Research, JiHwan Yeo Samsung Research, Cheongyo Bahk Samsung Research, JongHyun Park Samsung Research
DOI
10:52
26m
Talk
Accelerating Deep Neural Networks on Mobile Multicore NPUs
Main Conference
Hanwoong Jung Samsung Advanced Institute of Technology, Hexiang Ji Samsung Research, Alexey Pushchin Samsung Research, Maxim Ostapenko Samsung Advanced Institute of Technology, Wenlong Niu Samsung Research, Ilya Palachev Samsung Research, Yutian Qu Samsung Research, Pavel Fedin Samsung Research, Yuri Gribov Samsung Research, Heewoo Nam Samsung Advanced Institute of Technology, Dongguen Lim Samsung Advanced Institute of Technology, Hyunjun Kim Samsung Advanced Institute of Technology, Joonho Song Samsung Advanced Institute of Technology, Seungwon Lee Samsung Advanced Institute of Technology, Hwansoo Han Sungkyunkwan University
DOI
11:18
26m
Talk
PIMFlow: Compiler and Runtime Support for CNN Models on Processing-in-Memory DRAM
Main Conference
Yongwon Shin POSTECH, Juseong Park POSTECH, Sungjun Cho POSTECH, Hyojin Sung POSTECH
DOI