Pin or Fuse? Exploiting Scratchpad Memory to Reduce Off-Chip Data Transfer in DNN Accelerators (CGO 2023 - Main Conference)

Who

Hyuk-Jin Jeong, JiHwan Yeo, Cheongyo Bahk, JongHyun Park

Track

CGO 2023 Main Conference

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 1 Mar 2023 10:26 - 10:52 at Montreal 1-2-3 - Session 7 -- Neural Network Accelerators Chair(s): Lukas Sommer

Abstract

Growing interests in on-device AI have led to the proliferation of accelerators dedicated to neural network inference. Most ASIC accelerators are equipped with compiler-controlled scratchpad memory (SPM) used as a last-level cache to reduce the number of accesses to off-chip memory. A widely-used strategy for utilizing SPM is fused-layer execution, which divides a DNN model into groups of layers and forwards the intermediate results within each group without eviction to the off-chip memory. However, layer fusion has an inherent limitation that the fusion of consecutive layers increases the amount of computations, leading to sub-optimal performance.

This paper introduces a new dimension to SPM usage, which temporarily pins a feature map on SPM. Pinning reduces off-chip transfer without computation increase, but it is not applicable to all feature maps due to limited SPM size. We find that superior performance can be achieved by combination of pinning and fusion in MobileNet. Based on this observation, we propose a model-level optimization method that jointly applies pinning and fusion to minimize inference latency under memory constraints. Scheduling and allocation schemes are presented for automatic generation of optimized codes. Evaluation on the commercial AI accelerator shows that the proposed method reduces off-chip transfer of feature maps by 50% and improves inference latency by 15% on average without additional hardware, compared to the state-of-the-art fusion approach.

DOI

https://doi.org/10.1145/3579990.3580017

Hyuk-Jin Jeong

Samsung Research

South Korea

JiHwan Yeo

Samsung Research

South Korea

Cheongyo Bahk

Samsung Research

South Korea

JongHyun Park

Samsung Research

South Korea

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 1 Mar
Displayed time zone: Eastern Time (US & Canada) change

10:00 - 12:00	Session 7 -- Neural Network AcceleratorsMain Conference at Montreal 1-2-3 Chair(s): Lukas Sommer Codeplay Software

10:00 26m Talk		Flexer: Out-of-Order Scheduling for Multi-NPUs Main Conference Hyemi Min Seoul National University, Jungyoon Kwon Seoul National University, Bernhard Egger Seoul National University DOI
10:26 26m Talk		Pin or Fuse? Exploiting Scratchpad Memory to Reduce Off-Chip Data Transfer in DNN Accelerators Main Conference Hyuk-Jin Jeong Samsung Research, JiHwan Yeo Samsung Research, Cheongyo Bahk Samsung Research, JongHyun Park Samsung Research DOI
10:52 26m Talk		Accelerating Deep Neural Networks on Mobile Multicore NPUs Main Conference Hanwoong Jung Samsung Advanced Institute of Technology, Hexiang Ji Samsung Research, Alexey Pushchin Samsung Research, Maxim Ostapenko Samsung Advanced Institute of Technology, Wenlong Niu Samsung Research, Ilya Palachev Samsung Research, Yutian Qu Samsung Research, Pavel Fedin Samsung Research, Yuri Gribov Samsung Research, Heewoo Nam Samsung Advanced Institute of Technology, Dongguen Lim Samsung Advanced Institute of Technology, Hyunjun Kim Samsung Advanced Institute of Technology, Joonho Song Samsung Advanced Institute of Technology, Seungwon Lee Samsung Advanced Institute of Technology, Hwansoo Han Sungkyunkwan University DOI
11:18 26m Talk		PIMFlow: Compiler and Runtime Support for CNN Models on Processing-in-Memory DRAM Main Conference Yongwon Shin POSTECH, Juseong Park POSTECH, Sungjun Cho POSTECH, Hyojin Sung POSTECH DOI