BigFuzz: Efficient Fuzz Testing for Data Analytics using Framework Abstraction (ASE 2020 - Research Papers)

Who

Qian Zhang, Jiyuan Wang, Muhammad Ali Gulzar, Rohan Padhye, Miryung Kim

Track

ASE 2020 Research Papers

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 23 Sep 2020 16:20 - 16:40 at Kangaroo - Testing (2) Chair(s): Alex Groce

Abstract

As big data analytics become increasingly popular, data-intensive scalable computing (DISC) systems help address the scalability issue of handling large data. However, there exists a lack of automated testing techniques to test such data-centric applications, because data is often incomplete, continuously evolving, and hard to know a priori. Fuzz testing has been proven to be highly effective in other domains such as security; however, it is nontrivial to apply such traditional fuzzing to big data analytics directly for three reasons: (1) the long latency of DISC systems prohibits the applicability of fuzzing: naïve fuzzing would spend 98% of the time in setting up a test environment; (2) conventional branch coverage is unlikely to scale to DISC applications because most binary code comes from the framework implementation such as Apache Spark; and (3) random bit or byte-level mutations can hardly generate meaningful data, which fails to reveal real-world application bugs.

We propose a novel coverage-guided fuzz testing tool for big data analytics, called BigFuzz. The key essence of our approach is that: (a) we focus on exercising application logic as opposed to increasing framework code coverage by abstracting the DISC frame-work using specifications. BigFuzz performs automated source to source transformations to construct an equivalent DISC application suitable for fast test generation, and (b) we design schema-aware data mutation operators based on our in-depth study of DISC application error types. BigFuzz speeds up the fuzzing time by 78X-1477X compared to random fuzzing, improves application code coverage by 20%-271%, and achieves 33%-157% improvement in detecting application errors. When compared to the state of the art that uses symbolic execution to test big data analytics, BigFuzz is applicable to twice more programs and can find 80.6% more bugs.

Qian Zhang

University of California, Los Angeles

Jiyuan Wang

University of California, Los Angeles

Muhammad Ali Gulzar

University of California at Los Angeles, USA

United States

Rohan Padhye

Carnegie Mellon University

United States

Miryung Kim

University of California at Los Angeles, USA

United States

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 23 Sep
Displayed time zone: (UTC) Coordinated Universal Time change

16:00 - 17:00	Testing (2)Research Papers at Kangaroo Chair(s): Alex Groce Northern Arizona University

16:00 20m Talk		TestMC: Testing Model Counters using Differential and Metamorphic TestingExperience Research Papers Muhammad Usman University of Texas at Austin, USA, Wenxi Wang University of Texas at Austin, USA, Sarfraz Khurshid University of Texas at Austin, USA
16:20 20m Talk		BigFuzz: Efficient Fuzz Testing for Data Analytics using Framework Abstraction Research Papers Qian Zhang University of California, Los Angeles, Jiyuan Wang University of California, Los Angeles, Muhammad Ali Gulzar University of California at Los Angeles, USA, Rohan Padhye Carnegie Mellon University, Miryung Kim University of California at Los Angeles, USA
16:40 20m Talk		Scaling Client-Specific Equivalence Checking via Impact Boundary Search Research Papers Nick Feng University of Toronto, Vincent Hui University of Toronto, Federico Mora University of California, Berkeley, Marsha Chechik University of Toronto