Exploration of Memory Hybridization for RDD Caching in Spark
Apache Spark is a popular cluster computing framework for iterative analytics
workloads due to its use of Resilient Distributed Datasets (RDDs) to cache data for in-memory processing. We have revealed that the performance of Spark RDD cache can be severely limited if its capacity falls short to the needs of the workloads. In this paper, we have explored different memory hybridization strategies to leverage emergent Non-Volatile Memory (NVM) devices for Spark's RDD cache. We have found that a simple layered hybridization approach does not offer an effective solution. Therefore, we have designed a flat hybridization scheme to leverage NVM for caching RDD blocks, along with several architectural optimizations such as dynamic memory allocation for block unrolling, asynchronous migration with preemption, and opportunistic eviction to disk. We have performed an extensive set of experiments to evaluate the performance of our proposed flat hybridization strategy and found it to be robust in handling different system and NVM characteristics. Our proposed approach uses DRAM for a fraction of the hybrid memory system and yet manages to keep the increase in execution time to be within 10% on average. Moreover, our opportunistic
eviction of blocks to disk improves performance by up to 7.5% when utilized alongside the current mechanism.
Sun 23 JunDisplayed time zone: Tijuana, Baja California change
11:20 - 12:35 | |||
11:20 25mTalk | Exploration of Memory Hybridization for RDD Caching in Spark ISMM 2019 Md Muhib Khan Florida State University, Muhammad Ahad Ul Alam Florida State University, USA, Amit Kumar Nath Florida State University, USA, Weikuan Yu Florida State University, USA | ||
11:45 25mTalk | Learning When to Garbage Collect with Random Forests ISMM 2019 | ||
12:10 25mTalk | Timescale Functions for Parallel Memory Allocation ISMM 2019 |