A Framework for Fine-Grained Synchronization of Dependent GPU Kernels (CGO 2024 - Main Conference)

Who

Abhinav Jangda, Saeed Maleki, Maryam Mehri Dehnavi, Madan Musuvathi, Olli Saarikivi

Track

CGO 2024 Main Conference

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 4 Mar 2024 14:20 - 14:40 at Tinto - Compilers for GPUs Chair(s): Roland Leißa

Abstract

Machine Learning (ML) models execute several parallel computations including Generalized Matrix Multiplication, Convolution, Dropout, etc. These computations are commonly executed on Graphics Processing Units (GPUs), by dividing the computation into independent processing blocks, known as tiles. Since the number of tiles are usually higher than the execution units of a GPU, tiles are executed on all execution units in one or more waves. However, the number of tiles is not always a
multiple of the number of execution units. Thus, tiles executed in the final wave can under-utilize the GPU.
To address this issue, we present cuSync, a framework for synchronizing dependent kernels using a user-defined fine-grained synchronization policy to improve the GPU utilization. cuSync synchronizes tiles instead of kernels, which allows executing independent tiles of dependent kernels concurrently. We also present a compiler to generate diverse fine-grained synchronization policies based on dependencies between kernels. Our experiments found that synchronizing CUDA kernels using
cuSync reduces the inference times of four popular ML models: MegatronLM GPT-3 by up to 15%, LLaMA by up to 14%, ResNet-38 by up to 22%, and VGG-19 by up to 16% over several batch sizes.

Link to Preprint

https://arxiv.org/abs/2305.13450

Abhinav Jangda

Microsoft Research

United States

Saeed Maleki

Microsoft Research

United States

Maryam Mehri Dehnavi

University of Toronto

Canada

Madan Musuvathi

Microsoft Research

United States

Olli Saarikivi

Microsoft Research

United States

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 4 Mar
Displayed time zone: London change

14:20 - 15:40	Compilers for GPUsMain Conference at Tinto Chair(s): Roland Leißa University of Mannheim, School of Business Informatics and Mathematics

14:20 20m Talk		A Framework for Fine-Grained Synchronization of Dependent GPU Kernels Main Conference Abhinav Jangda Microsoft Research, Saeed Maleki Microsoft Research, Maryam Mehri Dehnavi University of Toronto, Madan Musuvathi Microsoft Research, Olli Saarikivi Microsoft Research Pre-print
14:40 20m Talk		Enhancing Performance through Control-Flow Unmerging and Loop Unrolling on GPUs Main Conference Alnis Murtovi TU Dortmund, Giorgis Georgakoudis Lawrence Livermore National Laboratory, Konstantinos Parasyris Lawrence Livermore National Laboratory, Chunhua Liao Lawrence Livermore National Laboratory, Ignacio Laguna Lawrence Livermore National Laboratory, Bernhard Steffen TU Dortmund
15:00 20m Talk		Retargeting and Respecializing GPU Workloads for Performance Portability Main Conference Ivan Radanov Ivanov Tokyo Institute of Technology; RIKEN R-CCS, Oleksandr Zinenko Google DeepMind, Jens Domke RIKEN R-CCS, Toshio Endo Tokyo Institute of Technology, William S. Moses University of Illinois at Urbana-Champaign; Google DeepMind
15:20 20m Talk		Seer: Predictive Runtime Kernel Selection for Irregular Problems Main Conference Ryan Swann AMD, Muhammad Osama AMD, Karthik Sangaiah AMD, Jalal Mahmud AMD Pre-print