A Compiler-based Approach for GPGPU Performance Calibration using TLP Modulation (Work in progress) (LCTES 2019 - Languages, Compilers, Tools and Theory of Embedded Systems)

Who

Yongseung Yu, Seokwon Kang, Yongjun Park

Track

LCTES 2019

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 23 Jun 2019 16:30 - 16:35 at 105A - Session 4: Benchmarking and In-progress Works Chair(s): Hyunok Oh

Abstract

Modern GPUs are the most successful accelerators as they provide outstanding performance gain by using CUDA or OpenCL programming models. For maximum performance, programmers typically try to maximize the number of thread blocks of target programs, and GPUs also generally attempt to allocate the maximum number of thread blocks to their GPU cores. However, many recent studies have pointed out that simply allocating the maximum number of thread blocks to GPU cores does not always guarantee the best performance. Thus, identifying proper number of thread blocks per GPU core is a major challenge. Despite these studies, most existing architectural techniques cannot be directly applied to current GPU hardware. Furthermore, the optimal number of thread blocks can vary significantly depending on the target hardware and application characteristics. To solve these problems, this study proposes a just-in-time thread block number adjustment system using CUDA binary modification upon an LLVM compiler framework, referred to as the CTA Limiter, in order to dynamically maximize GPU performance on real GPUs without reprograming. The framework gradually reduces the number of concurrent thread blocks of target CUDA workloads using extra shared memory allocation, and compares the execution time with the previous version to automatically identify the optimal number of thread blocks. The results showed meaningful performance improvements, averaging at 30%, 40%, and 44%, in GTX 960, GTX 1050, and GTX 1080 Ti, respectively.

Yongseung Yu

Hanyang University

Seokwon Kang

Hanyang University

Yongjun Park

Hanyang University

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 23 Jun
Displayed time zone: Tijuana, Baja California change

16:00 - 16:45	Session 4: Benchmarking and In-progress WorksLCTES 2019 at 105A Chair(s): Hyunok Oh Hanyang Univ

16:00 15m Full-paper		BitBench: A Benchmark for Bitstream Computing LCTES 2019 Kyle Daruwalla University of Wisconsin – Madison, Heng Zhuo University of Wisconsin - Madison, Carly Schulz University of Wisconsin - Madison, Mikko H. Lipasti
16:15 5m Short-paper		PANDORA: A Parallelizing Approximation-Discovery Framework (Work in progress) LCTES 2019 Greg Stitt University of Florida, David Campbell University of Florida
16:20 5m Short-paper		On Intermittence Bugs in the Battery-less Internet of Things (Work in progress) LCTES 2019 Andrea Maioli Politecnico di Milano, Italy, Luca Mottola Politecnico di Milano, Italy and RI.Se SICS, Sweden, Muhammad Hamad Alizai LUMS, Pakistan, Junaid Haroon Siddiqui
16:25 5m Short-paper		Raising Binaries to LLVM IR with MCTOLL (Work in progress) LCTES 2019 S. Bharadwaj Yadavalli Microsoft, Aaron Smith
16:30 5m Short-paper		A Compiler-based Approach for GPGPU Performance Calibration using TLP Modulation (Work in progress) LCTES 2019 Yongseung Yu Hanyang University, Seokwon Kang Hanyang University, Yongjun Park Hanyang University
16:35 5m Short-paper		An Empirical Comparison between Monkey Testing and Human Testing (Work in progress) LCTES 2019 Mostafa Mohammed Virginia Tech, Haipeng Cai Washington State University, USA, Na Meng Virginia Tech