Exploration of Memory Hybridization for RDD Caching in Spark (ISMM 2019)

Who

Md Muhib Khan, Muhammad Ahad Ul Alam, Amit Kumar Nath, Weikuan Yu

Track

ISMM 2019

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 23 Jun 2019 11:20 - 11:45 at 106A - Exotica

Abstract

Apache Spark is a popular cluster computing framework for iterative analytics
workloads due to its use of Resilient Distributed Datasets (RDDs) to cache data for in-memory processing. We have revealed that the performance of Spark RDD cache can be severely limited if its capacity falls short to the needs of the workloads. In this paper, we have explored different memory hybridization strategies to leverage emergent Non-Volatile Memory (NVM) devices for Spark's RDD cache. We have found that a simple layered hybridization approach does not offer an effective solution. Therefore, we have designed a flat hybridization scheme to leverage NVM for caching RDD blocks, along with several architectural optimizations such as dynamic memory allocation for block unrolling, asynchronous migration with preemption, and opportunistic eviction to disk. We have performed an extensive set of experiments to evaluate the performance of our proposed flat hybridization strategy and found it to be robust in handling different system and NVM characteristics. Our proposed approach uses DRAM for a fraction of the hybrid memory system and yet manages to keep the increase in execution time to be within 10% on average. Moreover, our opportunistic
eviction of blocks to disk improves performance by up to 7.5% when utilized alongside the current mechanism.

Md Muhib Khan

Florida State University

Muhammad Ahad Ul Alam

Florida State University, USA

Amit Kumar Nath

Florida State University, USA

Weikuan Yu

Florida State University, USA

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 23 Jun
Displayed time zone: Tijuana, Baja California change

11:20 - 12:35	ExoticaISMM 2019 at 106A

11:20 25m Talk		Exploration of Memory Hybridization for RDD Caching in Spark ISMM 2019 Md Muhib Khan Florida State University, Muhammad Ahad Ul Alam Florida State University, USA, Amit Kumar Nath Florida State University, USA, Weikuan Yu Florida State University, USA
11:45 25m Talk		Learning When to Garbage Collect with Random Forests ISMM 2019 Nicholas Jacek UMass Amherst, Eliot Moss University of Massachusetts Amherst
12:10 25m Talk		Timescale Functions for Parallel Memory Allocation ISMM 2019 Pengcheng Li Google, Inc, Hao Luo University of Rochester, Chen Ding University of Rochester