Scaling Up Parallel GC Work-Stealing in Many-Core Environments
Parallel copying garbage collection (GC) is widely used in the de facto Java virtual machines such as OpenJDK and OpenJ9. OpenJDK uses work-stealing for copying objects in the Parallel GC and Garbage-First (G1) GC policies to balance the copying task among GC threads. When a thread has no task in its own queue, it tries to steal a task from another thread's queue as a thief. When a thief succeeds in stealing a task, it processes the task and enqueues the children of the task into its queue, which is accessible from other thieves.Unfortunately, the overhead of the work-stealing framework becomes non-negligible when we aim to achieve a minimum GC pause time by increasing the number of GC threads. Since the number of tasks processed per thread decreases, thieves frequently try to steal tasks from others at a low success rate. When a thief fails in steals continuously, it needs to wait in a spin loop on the termination protocol of the work-stealing framework. Spinning in a loop frequently results in high CPU utilization, which is not acceptable in a large-scale data center where severe power management is required. This paper proposes two approaches named steal-best-of-many selection and spin-less termination to reduce the overhead in the work-stealing framework. Steal-best-of-many selection reduces steal failures by changing the number of queue selections to steal in accordance with the number of GC threads. Spin-less termination moves a part of the object copies into a spin loop by changing the procedure of copying GC. It reduces part of the GC pause time for the object copy as well as the CPU utilization for the spin loop. We developed a prototype on OpenJDK8 and evaluated it using SPECjbb2015 and SPECjvm2008 benchmarks. Critical-jOPS performance of SPECjbb2015 improved by 18% at maximum and scores of the SPECjvm2008 benchmarks improved by 1-5%.
Sun 23 JunDisplayed time zone: Tijuana, Baja California change
09:00 - 11:00 | |||
09:00 5mDay opening | Welcome from the chairs ISMM 2019 | ||
09:05 40mTalk | Keynote 1: Relaxed memory ordering needs a better specification ISMM 2019 Hans-J. Boehm Google | ||
09:45 25mTalk | Automatic GPU Memory Management for Large Neural Models in TensorFlow ISMM 2019 Tung D. Le IBM Research - Tokyo, Haruki Imai IBM Research - Tokyo, Yasushi Negishi IBM Research - Tokyo, Kiyokuni Kawachiya IBM Research - Tokyo | ||
10:10 25mTalk | Massively Parallel GPU Memory Compaction ISMM 2019 | ||
10:35 25mTalk | Scaling Up Parallel GC Work-Stealing in Many-Core Environments ISMM 2019 Michihiro Horie IBM Research - Tokyo, Kazunori Ogata IBM Research, Japan, Mikio Takeuchi IBM Research - Tokyo, Hiroshi Horii IBM Research, Japan |