Designing Locality and NUMA Aware MPI Runtime for Nested Virtualization based HPC Cloud with SR-IOV Enabled InfiniBand
Hypervisor-based virtualization solutions reveal good security and isolation, while container-based solutions make applications and workloads more portable and distributed in an effective, standardized and repeatable way. There- fore, nested virtualization based computing environments (e.g., container over virtual machine), which inherit the capabilities from both solutions, are becoming more and more attractive in clouds (e.g., running Docker on Amazon EC2 VMs). Recent studies have shown that running applications in either VMs or containers still has significant overhead, especially for I/O intensive workloads. This motivates us to investigate whether the nested virtualization based solution can be adopted to build high-performance computing (HPC) clouds for running MPI applications efficiently and where the bottlenecks lie. To eliminate performance bottlenecks, we propose a high-performance two-layer locality and NUMA aware MPI library, which is able to dynamically detect co-resident containers inside one VM as well as detect co-resident VM inside one host at MPI run- time. Thus the MPI processes across different containers and VMs can communicate to each other by shared memory or Cross Memory Attach (CMA) channels instead of network channel if they are co-resident. We further propose an enhanced NUMA aware hybrid design to utilize InfiniBand loopback based channel to optimize large message transfer across containers when they are running on differ- ent sockets. Performance evaluations show that compared with the performance of the state-of-art design, our pro- posed enhance-hybrid design can bring up to 184%, 81% and 12% benefit on point-to-point, collective operations, and end applications. Compared to the default performance, our enhanced-hybrid design delivers up to 184%, 85% and 16% performance improvement.
Sun 9 AprDisplayed time zone: Azores change
14:00 - 15:30 | |||
14:00 30mTalk | One Process to Reap Them All: Garbage Collection As A Service Session 5 Ahmed Hussein Purdue University / Huawei, USA, Mathias Payer Purdue University, Tony Hosking Australian National University, Data61, and Purdue University, Christopher A. Vick Qualcomm | ||
14:30 30mTalk | Designing Locality and NUMA Aware MPI Runtime for Nested Virtualization based HPC Cloud with SR-IOV Enabled InfiniBand Session 5 | ||
15:00 30mTalk | Flexible Page-level Memory Access Monitoring Based on Virtualization Hardware Session 5 Kai Lu College of Computer, National University of Defense Technology, Changsha, PR China, Wenzhe Zhang College of Computer, National University of Defense Technology, Changsha, PR China, Xiaoping Wang College of Computer, National University of Defense Technology, Changsha, PR China, Mikel Luján , Andrew Nisbet The University of Manchester File Attached |