Write a Blog >>
Sun 23 Jun 2019 09:45 - 10:10 at 106A - Scaling Up

Deep learning models are becoming larger and will not fit in the limited memory of accelerators such as GPUs for training. Though many methods have been proposed to solve this problem, they are rather ad-hoc in nature and difficult to extend and integrate with other techniques. In this paper, we tackle the problem in a formal way to provide a strong foundation for supporting large models. We propose a method of formally rewriting the computational graph of a model where swap-out and swap-in operations are inserted to temporarily store intermediate results on CPU memory. By introducing a categorized topological ordering for simulating graph execution, the memory consumption of a model can be easily analyzed by using operation distances in the ordering. As a result, the problem of fitting a large model into a memory-limited accelerator is reduced to the problem of reducing operation distances in a categorized topological ordering. We then show how to formally derive swap-out and swap-in operations from an existing graph and present rules to optimize the graph. Finally, we propose a simulation-based auto-tuning to automatically find suitable graph-rewriting parameters for the best performance. We developed a module in TensorFlow, called LMS, by which we successfully trained ResNet-50 with a 4.9x larger mini-batch size and 3D U-Net with a 5.6x larger image resolution.

Sun 23 Jun

Displayed time zone: Tijuana, Baja California change

09:00 - 11:00
Scaling UpISMM 2019 at 106A
09:00
5m
Day opening
Welcome from the chairs
ISMM 2019
Harry Xu University of California, Los Angeles (UCLA), Jeremy Singer University of Glasgow
09:05
40m
Talk
Keynote 1: Relaxed memory ordering needs a better specification
ISMM 2019
09:45
25m
Talk
Automatic GPU Memory Management for Large Neural Models in TensorFlow
ISMM 2019
Tung D. Le IBM Research - Tokyo, Haruki Imai IBM Research - Tokyo, Yasushi Negishi IBM Research - Tokyo, Kiyokuni Kawachiya IBM Research - Tokyo
10:10
25m
Talk
Massively Parallel GPU Memory Compaction
ISMM 2019
Matthias Springer Tokyo Institute of Technology, Hidehiko Masuhara Tokyo Institute of Technology
10:35
25m
Talk
Scaling Up Parallel GC Work-Stealing in Many-Core Environments
ISMM 2019
Michihiro Horie IBM Research - Tokyo, Kazunori Ogata IBM Research, Japan, Mikio Takeuchi IBM Research - Tokyo, Hiroshi Horii IBM Research, Japan