Training of Deep Learning Pipelines on Memory-Constrained GPUs via Segmented Fused-Tiled Execution
Training models with massive inputs is a significant challenge in the development of Deep Learning pipelines to process very large digital image datasets as required by Whole Slide Imaging (WSI) in computational pathology and analysis of brain fMRI images in computational neuroscience. Graphics Processing Units (GPUs) represent the primary workhorse in training and inference of Deep Learning models. In order to use GPUs to run inference or training on a neural network pipeline, state-of-the-art machine learning frameworks like PyTorch and TensorFlow currently require that the collective memory on the GPUs must be larger than the size of the activations at any stage in the pipeline. Therefore, existing Deep Learning pipelines for these use cases have been forced to develop sub-optimal “patch-based” modeling approaches, where images are processed in small segments of an image. In this paper, we present a solution to this problem by employing tiling in conjunction with check-pointing, thereby enabling arbitrarily large images to be directly processed, irrespective of the size of global memory on a GPU and the number of available GPUs. Experimental results using PyTorch demonstrate enhanced functionality/performance over existing frameworks.
Wed 6 AprDisplayed time zone: Eastern Time (US & Canada) change
10:20 - 11:20 | Session 3: Compilers and Machine LearningCC Research Papers at CC Virtual Room Chair(s): Ayal Zaks Intel Corporation and Technion, Israel | ||
10:20 15mPaper | One-Shot Tuner for Deep Learning Compilers CC Research Papers DOI | ||
10:35 15mPaper | Training of Deep Learning Pipelines on Memory-Constrained GPUs via Segmented Fused-Tiled Execution CC Research Papers Yufan Xu University of Utah, Saurabh Raje , Atanas Rountev Ohio State University, Gerald Sabin RNET Technologies, Aravind Sukumaran-Rajam Washington State University, Ponnuswamy Sadayappan University of Utah DOI | ||
10:50 15mPaper | MLIR-Based Code Generation for GPU Tensor Cores CC Research Papers Navdeep Katel Indian Institute of Science, PolyMage Labs, Vivek Khandelwal Indian Institute of Science, Uday Bondhugula Indian Institute of Science, PolyMage Labs DOI | ||
11:05 15mPaper | Automating Reinforcement Learning Architecture Design for Code Optimization CC Research Papers HuantingWang , Zhanyong Tang Northwest University, Cheng Zhang Northwest University, Jiaqi Zhao Northwest University, Chris Cummins Facebook, Hugh Leather Facebook, Zheng Wang University of Leeds, UK DOI |