Write a Blog >>
CC 2022
Tue 5 - Wed 6 April 2022 Online conference
Wed 6 Apr 2022 11:35 - 11:50 at CC Virtual Room - Session 4: Parallelism Chair(s): Bernhard Egger

Accelerated computing has increased the need to specialize how a program is parallelized depending on the target. Fully exploiting a highly parallel accelerator, such as a GPU, demands more parallelism and sometimes more levels of parallelism than a multicore CPU. OpenMP has a directive for each level of parallelism, but choosing directives for each target can incur a significant productivity cost. We argue that using the new OpenMP loop directive with an appropriate compiler decision process can achieve the same performance benefits of target-specific parallelization with the productivity advantage of a single directive for all targets. In this paper, we introduce a fully descriptive model and demonstrate its benefits with an implementation of the loop directive, comparing performance, productivity, and portability against other production compilers using the SPEC ACCEL benchmark suite. We provide an implementation of our proposal in NVIDIA’s HPC compiler. It yields up to 56X speedup and an average of 1.91x-1.79x speedup compared to the baseline performance (depending on the host system) on GPUs, and preserves CPU performance. In addition, our proposal requires 60% fewer parallelism directives.

Wed 6 Apr

Displayed time zone: Eastern Time (US & Canada) change

11:20 - 11:50
Session 4: ParallelismCC Research Papers at CC Virtual Room
Chair(s): Bernhard Egger Seoul National University
11:20
15m
Paper
Memory Access Scheduling to Reduce Thread Migrations
CC Research Papers
Sana Damani Georgia Institute of Technology, Prithayan Barua Georgia Institute of Technology, USA, Vivek Sarkar Georgia Institute of Technology
DOI
11:35
15m
Paper
Performant Portable OpenMP
CC Research Papers
DOI