Automatic GPU Memory Management for Large Neural Models in TensorFlow
Deep learning models are becoming larger and will not fit in the limited memory of accelerators such as GPUs for training. Though many methods have been proposed to solve this problem, they are rather ad-hoc in nature and difficult to extend and integrate with other techniques. In this paper, we tackle the problem in a formal way to provide a strong foundation for supporting large models. We propose a method of formally rewriting the computational graph of a model where swap-out and swap-in operations are inserted to temporarily store intermediate results on CPU memory. By introducing a categorized topological ordering for simulating graph execution, the memory consumption of a model can be easily analyzed by using operation distances in the ordering. As a result, the problem of fitting a large model into a memory-limited accelerator is reduced to the problem of reducing operation distances in a categorized topological ordering. We then show how to formally derive swap-out and swap-in operations from an existing graph and present rules to optimize the graph. Finally, we propose a simulation-based auto-tuning to automatically find suitable graph-rewriting parameters for the best performance. We developed a module in TensorFlow, called LMS, by which we successfully trained ResNet-50 with a 4.9x larger mini-batch size and 3D U-Net with a 5.6x larger image resolution.
Sun 23 JunDisplayed time zone: Tijuana, Baja California change
09:00 - 11:00 | |||
09:00 5mDay opening | Welcome from the chairs ISMM 2019 | ||
09:05 40mTalk | Keynote 1: Relaxed memory ordering needs a better specification ISMM 2019 Hans-J. Boehm Google | ||
09:45 25mTalk | Automatic GPU Memory Management for Large Neural Models in TensorFlow ISMM 2019 Tung D. Le IBM Research - Tokyo, Haruki Imai IBM Research - Tokyo, Yasushi Negishi IBM Research - Tokyo, Kiyokuni Kawachiya IBM Research - Tokyo | ||
10:10 25mTalk | Massively Parallel GPU Memory Compaction ISMM 2019 | ||
10:35 25mTalk | Scaling Up Parallel GC Work-Stealing in Many-Core Environments ISMM 2019 Michihiro Horie IBM Research - Tokyo, Kazunori Ogata IBM Research, Japan, Mikio Takeuchi IBM Research - Tokyo, Hiroshi Horii IBM Research, Japan |