It has been widely observed that data movement is emerging as the primary bottleneck to scalability and energy efficiency in future hardware, especially for applications and algorithms that are not cache-friendly and achieve below 1% of peak performance on today’s systems. The idea of “moving compute to data” has been suggested as one approach to address this challenge. While there are approaches that can achieve this migration in software, hardware support is a promising direction from the perspectives of lower overheads and programmer productivity. Migratory thread architectures migrate lightweight hardware thread contexts to the location of the data instead of transferring data to the requesting processor. However, while transporting thread contexts is cheaper than moving data, thread migrations still incur energy and bandwidth overheads and can be particularly expensive if threads frequently migrate in a ping-pong manner between processors due to poor locality of access. In this paper, we propose Memory Access Scheduling, a new compiler optimization that aims to reduce the number of overall thread migrations when executing a program on migratory thread architectures. Our experiments show performance improvements with a geometric mean speedup of 1.23× for a set of 7 explicitly-parallelized kernels, and of 1.10× for a set of 15 automatically-parallelized kernels. We believe that memory access scheduling will also be an important optimization for other locality-centric architectures that benefit from software thread migrations, such as multi-threaded NUMA architectures.
Wed 6 AprDisplayed time zone: Eastern Time (US & Canada) change
11:20 - 11:50
|Memory Access Scheduling to Reduce Thread Migrations
CC Research Papers
Sana Damani Georgia Institute of Technology, Prithayan Barua Georgia Institute of Technology, USA, Vivek Sarkar Georgia Institute of TechnologyDOI
|Performant Portable OpenMP
CC Research PapersDOI