Write a Blog >>
Sun 23 Jun 2019 11:20 - 11:45 at 106A - Exotica

Apache Spark is a popular cluster computing framework for iterative analytics
workloads due to its use of Resilient Distributed Datasets (RDDs) to cache data for in-memory processing. We have revealed that the performance of Spark RDD cache can be severely limited if its capacity falls short to the needs of the workloads. In this paper, we have explored different memory hybridization strategies to leverage emergent Non-Volatile Memory (NVM) devices for Spark's RDD cache. We have found that a simple layered hybridization approach does not offer an effective solution. Therefore, we have designed a flat hybridization scheme to leverage NVM for caching RDD blocks, along with several architectural optimizations such as dynamic memory allocation for block unrolling, asynchronous migration with preemption, and opportunistic eviction to disk. We have performed an extensive set of experiments to evaluate the performance of our proposed flat hybridization strategy and found it to be robust in handling different system and NVM characteristics. Our proposed approach uses DRAM for a fraction of the hybrid memory system and yet manages to keep the increase in execution time to be within 10% on average. Moreover, our opportunistic
eviction of blocks to disk improves performance by up to 7.5% when utilized alongside the current mechanism.

Sun 23 Jun
Times are displayed in time zone: (GMT-07:00) Tijuana, Baja California change

11:20 - 12:35: ISMM 2019 - Exotica at 106A
ismm-2019-papers11:20 - 11:45
Md Muhib KhanFlorida State University, Muhammad Ahad Ul AlamFlorida State University, USA, Amit Kumar NathFlorida State University, USA, Weikuan YuFlorida State University, USA
ismm-2019-papers11:45 - 12:10
Nicholas JacekUMass Amherst, Eliot MossUniversity of Massachusetts Amherst
ismm-2019-papers12:10 - 12:35
Pengcheng LiGoogle, Inc, Hao LuoUniversity of Rochester, Chen DingUniversity of Rochester