Write a Blog >>
CC 2022
Tue 5 - Wed 6 April 2022 Online conference
Wed 6 Apr 2022 10:50 - 11:05 at CC Virtual Room - Session 3: Compilers and Machine Learning Chair(s): Ayal Zaks

The state-of-the-art in high-performance deep learning today is primarily driven by manually developed libraries optimized and highly tuned by expert programmers using low-level abstractions with significant effort. This effort is often repeated for similar hardware and future ones. In this work, we pursue and evaluate the more modular and reusable approach of using compiler IR infrastructure to generate libraries by encoding all the required optimizations as a sequence of transformations and customized passes on an IR. We believe that until the recent introduction of MLIR (Multi-level intermediate representation), it had been hard to represent and transform computation at various levels of abstraction within a single IR. Using the MLIR infrastructure, we build a transformation and lowering pipeline to automatically generate near-peak performance code for matrix-matrix multiplication (matmul) as well as matmul fused with simple pointwise operators targeting tensor cores on NVIDIA GPUs. On a set of problem sizes ranging from 256 to 16384, our performance evaluation shows that we can obtain performance that is 0.95× to 1.19× and 0.80× to 1.60× of cuBLAS for FP32 and FP16 accumulate respectively on NVIDIA’s Ampere based Geforce 3090 RTX. Furthermore, by allowing the fusion of common pointwise operations with matrix-matrix multiplication, we obtain performance ranging from 0.95× to 1.67× of a cuBLAS-based implementation. Additionally, we present matmul-like examples such as 3-d contraction and batched matmul, which the pipeline can efficiently handle while providing competitive performance. We believe that these results motivate further research and engineering on automatic domain-specific library generation using compiler IR infrastructure for similar specialized accelerators.

Wed 6 Apr

Displayed time zone: Eastern Time (US & Canada) change

10:20 - 11:20
Session 3: Compilers and Machine LearningCC Research Papers at CC Virtual Room
Chair(s): Ayal Zaks Intel Corporation and Technion, Israel
10:20
15m
Paper
One-Shot Tuner for Deep Learning CompilersArtifacts Available v1.1Artifacts Evaluated – Functional v1.1
CC Research Papers
Jaehun Ryu POSTECH, Eunhyeok Park POSTECH, Hyojin Sung POSTECH
DOI
10:35
15m
Paper
Training of Deep Learning Pipelines on Memory-Constrained GPUs via Segmented Fused-Tiled ExecutionArtifacts Evaluated – Reusable v1.1Artifacts Available v1.1Results Reproduced v1.1
CC Research Papers
Yufan Xu University of Utah, Saurabh Raje , Atanas Rountev Ohio State University, Gerald Sabin RNET Technologies, Aravind Sukumaran-Rajam Washington State University, Ponnuswamy Sadayappan University of Utah
DOI
10:50
15m
Paper
MLIR-Based Code Generation for GPU Tensor Cores
CC Research Papers
Navdeep Katel Indian Institute of Science, PolyMage Labs, Vivek Khandelwal Indian Institute of Science, Uday Bondhugula Indian Institute of Science, PolyMage Labs
DOI
11:05
15m
Paper
Automating Reinforcement Learning Architecture Design for Code OptimizationArtifacts Evaluated – Reusable v1.1Artifacts Available v1.1Results Reproduced v1.1
CC Research Papers
HuantingWang , Zhanyong Tang Northwest University, Cheng Zhang Northwest University, Jiaqi Zhao Northwest University, Chris Cummins Facebook, Hugh Leather Facebook, Zheng Wang University of Leeds, UK
DOI