CGO 2024
Sat 2 - Wed 6 March 2024 Edinburgh, United Kingdom
Mon 4 Mar 2024 15:00 - 15:20 at Tinto - Compilers for GPUs Chair(s): Roland Leißa

In order to come close to peak performance, accelerators like GPUs require significant architecture-specific tuning that understand the availability of shared memory, parallelism, tensor cores, etc. Unfortunately, the pursuit of higher-performance and lower costs have led to a significant diversification of architecture designs across, even from the same vendor. This creates the need for performance portability across different GPUs, especially important for programs in a particular programming model with a certain architecture in mind. Even when the program can be seamlessly executed on a different architecture, it may suffer a performance penalty due to it not being sized appropriately to the available hardware resources such as fast memory and registers, let alone not using more newer advanced features of the architecture.

We propose a new approach to improving performance of (legacy) CUDA programs for modern machines by automatically adjusting the amount of work each parallel thread does, and the amount of memory and register resources it requires. By operating within the MLIR compiler infrastructure, we are able to also target AMD GPUs performing automatic translation from CUDA and simultaneously adjusting the program granularity to fit the size of target GPUs.

Combined with autotuning assisted by the platform-specific compiler, our approach demonstrates 16% geomean speedup on the Rodinia benchmark suite over baseline CUDA implementation as well as performance parity between similar NVIDIA and AMD GPUs executing the same CUDA program.

Mon 4 Mar

Displayed time zone: London change

14:20 - 15:40
Compilers for GPUsMain Conference at Tinto
Chair(s): Roland Leißa University of Mannheim, School of Business Informatics and Mathematics
14:20
20m
Talk
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels
Main Conference
Abhinav Jangda Microsoft Research, Saeed Maleki Microsoft Research, Maryam Mehri Dehnavi University of Toronto, Madan Musuvathi Microsoft Research, Olli Saarikivi Microsoft Research
Pre-print
14:40
20m
Talk
Enhancing Performance through Control-Flow Unmerging and Loop Unrolling on GPUs
Main Conference
Alnis Murtovi TU Dortmund, Giorgis Georgakoudis Lawrence Livermore National Laboratory, Konstantinos Parasyris Lawrence Livermore National Laboratory, Chunhua Liao Lawrence Livermore National Laboratory, Ignacio Laguna Lawrence Livermore National Laboratory, Bernhard Steffen TU Dortmund
15:00
20m
Talk
Retargeting and Respecializing GPU Workloads for Performance Portability
Main Conference
Ivan Radanov Ivanov Tokyo Institute of Technology; RIKEN R-CCS, Oleksandr Zinenko Google DeepMind, Jens Domke RIKEN R-CCS, Toshio Endo Tokyo Institute of Technology, William S. Moses University of Illinois at Urbana-Champaign; Google DeepMind
15:20
20m
Talk
Seer: Predictive Runtime Kernel Selection for Irregular Problems
Main Conference
Pre-print