Automatic GPU Memory Management for Large Neural Models in TensorFlow (ISMM 2019)

Who

Tung D. Le, Haruki Imai, Yasushi Negishi, Kiyokuni Kawachiya

Track

ISMM 2019

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 23 Jun 2019 09:45 - 10:10 at 106A - Scaling Up

Abstract

Deep learning models are becoming larger and will not fit in the limited memory of accelerators such as GPUs for training. Though many methods have been proposed to solve this problem, they are rather ad-hoc in nature and difficult to extend and integrate with other techniques. In this paper, we tackle the problem in a formal way to provide a strong foundation for supporting large models. We propose a method of formally rewriting the computational graph of a model where swap-out and swap-in operations are inserted to temporarily store intermediate results on CPU memory. By introducing a categorized topological ordering for simulating graph execution, the memory consumption of a model can be easily analyzed by using operation distances in the ordering. As a result, the problem of fitting a large model into a memory-limited accelerator is reduced to the problem of reducing operation distances in a categorized topological ordering. We then show how to formally derive swap-out and swap-in operations from an existing graph and present rules to optimize the graph. Finally, we propose a simulation-based auto-tuning to automatically find suitable graph-rewriting parameters for the best performance. We developed a module in TensorFlow, called LMS, by which we successfully trained ResNet-50 with a 4.9x larger mini-batch size and 3D U-Net with a 5.6x larger image resolution.

Tung D. Le

IBM Research - Tokyo

Haruki Imai

IBM Research - Tokyo

Japan

Yasushi Negishi

IBM Research - Tokyo

Kiyokuni Kawachiya

IBM Research - Tokyo

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 23 Jun
Displayed time zone: Tijuana, Baja California change

09:00 - 11:00	Scaling UpISMM 2019 at 106A

09:00 5m Day opening		Welcome from the chairs ISMM 2019 Harry Xu University of California, Los Angeles (UCLA), Jeremy Singer University of Glasgow
09:05 40m Talk		Keynote 1: Relaxed memory ordering needs a better specification ISMM 2019 Hans-J. Boehm Google
09:45 25m Talk		Automatic GPU Memory Management for Large Neural Models in TensorFlow ISMM 2019 Tung D. Le IBM Research - Tokyo, Haruki Imai IBM Research - Tokyo, Yasushi Negishi IBM Research - Tokyo, Kiyokuni Kawachiya IBM Research - Tokyo
10:10 25m Talk		Massively Parallel GPU Memory Compaction ISMM 2019 Matthias Springer Tokyo Institute of Technology, Hidehiko Masuhara Tokyo Institute of Technology
10:35 25m Talk		Scaling Up Parallel GC Work-Stealing in Many-Core Environments ISMM 2019 Michihiro Horie IBM Research - Tokyo, Kazunori Ogata IBM Research, Japan, Mikio Takeuchi IBM Research - Tokyo, Hiroshi Horii IBM Research, Japan