Apache Spark has been widely used to build big data applications. Spark utilizes the abstraction of Resilient Distributed Dataset (RDD) to store and retrieve large-scale data. To reduce duplicate computation of an RDD, Spark can cache the RDD in memory and then reuse it later, thus improving performance. Spark relies on application developers to enforce caching decisions by using persist() and unpersist() APIs, e.g., which RDD is persisted and when the RDD is persisted / unpersisted. Incorrect RDD caching decisions can cause duplicate computations, or waste precious memory resource, thus introducing serious performance degradation in Spark applications. In this paper, we propose CacheCheck, to automatically detect cache-related bugs in Spark applications. We summarize six cache-related bug patterns in Spark applications, and then dynamically detect cache-related bugs by analyzing the execution traces of Spark applications. We evaluate CacheCheck on six real-world Spark applications. The experimental result shows that CacheCheck detects 72 previously unknown cache-related bugs, and 28 of them have been fixed by developers.
Tue 21 JulDisplayed time zone: Tijuana, Baja California change
16:10 - 17:10 | CHALLENGING DOMAINSTechnical Papers at Zoom Chair(s): Yi Li Nanyang Technological University Public Live Stream/Recording. Registered participants should join via the Zoom link distributed in Slack. | ||
16:10 20mTalk | Intermittently Failing Tests in the Embedded Systems Domain Technical Papers Per Erik Strandberg Westermo Network Technologies AB, Thomas Ostrand , Elaine Weyuker Mälardalen University, Wasif Afzal Mälardalen University, Daniel Sundmark Mälardalen University DOI Pre-print Media Attached | ||
16:30 20mTalk | Feasible and Stressful Trajectory Generation for Mobile Robots Technical Papers Carl Hildebrandt University of Virginia, Sebastian Elbaum University of Virginia, USA, Nicola Bezzo University of Virginia, Matthew B Dwyer University of Virginia DOI | ||
16:50 20mTalk | Detecting Cache-Related Bugs in Spark Applications Technical Papers Hui Li , Dong Wang Institute of software, Chinese academy of sciences, Tianze Huang , Yu Gao Institute of Software, Chinese Academy of Sciences, China, Wensheng Dou Institute of Software, Chinese Academy of Sciences, Lijie Xu Institute of Software, Chinese Academy of Sciences, Wei Wang , Jun Wei State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Hua Zhong DOI |