Enabling Pipeline Parallelism in Heterogeneous Managed Runtime Environments via Batch Processing
During the last decade, managed runtime systems have been constantly evolving to become capable of exploiting underlying hardware accelerators, such as GPUs and FPGAs. Regardless of the programming language and their corresponding runtime systems, the majority of the work has been focusing on the compiler front trying to tackle the challenging task of how to enable just-in-time compilation and execution of arbitrary code segments on various accelerators. Besides this challenging task, another important aspect that defines both functional correctness and performance of managed runtime systems is that of automatic memory management. Although automatic memory management improves productivity by abstracting away memory allocation and maintenance, it hinders the capability of using specific memory regions, such as pinned memory, in order to perform data transfer times between the CPU and hardware accelerators.
In this paper, we introduce and evaluate a series of memory optimizations specifically tailored for heterogeneous managed runtime systems. In particular, we propose: (i) transparent and automatic “parallel batch processing” for overlapping data transfers and computation between the host and hardware accelerators in order to enable pipeline parallelism, and (ii) “off-heap pinned memory” in combination with parallel batch processing in order to increase the performance of data transfers without posing any on-heap overheads. These two techniques have been implemented in the context of the state-of-the-art open-source TornadoVM and their combination can lead up to 2.5x end-to-end performance speedup against sequential batch processing.
Tue 1 MarDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
15:30 - 16:30 | Session-2: Runtime VirtualizationResearch Papers at Online Chair(s): Mingyu Wu Shanghai Jiao Tong University | ||
15:30 20mTalk | Enabling Pipeline Parallelism in Heterogeneous Managed Runtime Environments via Batch Processing Research Papers Florin Blanaru The University of Manchester, Athanasios Stratikopoulos The University of Manchester, Juan Fumero University of Manchester, UK, Christos Kotselidis KTM Innovation / The University of Manchester DOI Pre-print | ||
15:50 20mTalk | Transparent and Lightweight Object Placement for Managed Workloads atop Hybrid Memories Research Papers | ||
16:10 20mTalk | Capability Boehm: Challenges and Opportunities for Garbage Collection with Capability Hardware Research Papers Link to publication DOI Pre-print |
The Zoom room for Session 2 is at https://rochester.zoom.us/j/95639573724?pwd=Q3Fscitpd3VIcnVTaEMwRTFUS2hRdz09.