CGO 2023
Sat 25 February - Wed 1 March 2023 Montreal, Canada
Wed 1 Mar 2023 10:26 - 10:52 at Montreal 1-2-3 - Session 7 -- Neural Network Accelerators Chair(s): Lukas Sommer

Growing interests in on-device AI have led to the proliferation of accelerators dedicated to neural network inference. Most ASIC accelerators are equipped with compiler-controlled scratchpad memory (SPM) used as a last-level cache to reduce the number of accesses to off-chip memory. A widely-used strategy for utilizing SPM is fused-layer execution, which divides a DNN model into groups of layers and forwards the intermediate results within each group without eviction to the off-chip memory. However, layer fusion has an inherent limitation that the fusion of consecutive layers increases the amount of computations, leading to sub-optimal performance.

This paper introduces a new dimension to SPM usage, which temporarily pins a feature map on SPM. Pinning reduces off-chip transfer without computation increase, but it is not applicable to all feature maps due to limited SPM size. We find that superior performance can be achieved by combination of pinning and fusion in MobileNet. Based on this observation, we propose a model-level optimization method that jointly applies pinning and fusion to minimize inference latency under memory constraints. Scheduling and allocation schemes are presented for automatic generation of optimized codes. Evaluation on the commercial AI accelerator shows that the proposed method reduces off-chip transfer of feature maps by 50% and improves inference latency by 15% on average without additional hardware, compared to the state-of-the-art fusion approach.

Wed 1 Mar

Displayed time zone: Eastern Time (US & Canada) change

10:00 - 12:00
Session 7 -- Neural Network AcceleratorsMain Conference at Montreal 1-2-3
Chair(s): Lukas Sommer Codeplay Software
10:00
26m
Talk
Flexer: Out-of-Order Scheduling for Multi-NPUs
Main Conference
Hyemi Min Seoul National University, Jungyoon Kwon Seoul National University, Bernhard Egger Seoul National University
DOI
10:26
26m
Talk
Pin or Fuse? Exploiting Scratchpad Memory to Reduce Off-Chip Data Transfer in DNN Accelerators
Main Conference
Hyuk-Jin Jeong Samsung Research, JiHwan Yeo Samsung Research, Cheongyo Bahk Samsung Research, JongHyun Park Samsung Research
DOI
10:52
26m
Talk
Accelerating Deep Neural Networks on Mobile Multicore NPUs
Main Conference
Hanwoong Jung Samsung Advanced Institute of Technology, Hexiang Ji Samsung Research, Alexey Pushchin Samsung Research, Maxim Ostapenko Samsung Advanced Institute of Technology, Wenlong Niu Samsung Research, Ilya Palachev Samsung Research, Yutian Qu Samsung Research, Pavel Fedin Samsung Research, Yuri Gribov Samsung Research, Heewoo Nam Samsung Advanced Institute of Technology, Dongguen Lim Samsung Advanced Institute of Technology, Hyunjun Kim Samsung Advanced Institute of Technology, Joonho Song Samsung Advanced Institute of Technology, Seungwon Lee Samsung Advanced Institute of Technology, Hwansoo Han Sungkyunkwan University
DOI
11:18
26m
Talk
PIMFlow: Compiler and Runtime Support for CNN Models on Processing-in-Memory DRAM
Main Conference
Yongwon Shin POSTECH, Juseong Park POSTECH, Sungjun Cho POSTECH, Hyojin Sung POSTECH
DOI