Fuzzing applies input mutations iteratively with the only goal of finding more bugs, resulting in synthetic tests that tend to lack realism. Big data analytics are expected to ingest real-world data as input. Therefore, when synthetic test data are not easily comprehensible, they are less likely to facilitate the downstream task of fixing errors. Our position is that fuzzing in this domain must achieve both high naturalness and high code coverage. We propose a new natural synthetic test generation tool for big data analytics, called NaturalFuzz. It generates both unstructured, semi-structured, and structured data with corresponding semantics such as ‘zipcode’ and ‘age.’ The key insights behind NaturalFuzz are two-fold. First, though existing test data may be small and lack coverage, we can grow this data to increase code coverage. Second, we can strategically mix constituent parts across different rows and columns to construct new realistic synthetic data by leveraging fine-grained data provenance. On commercial big data application benchmarks, NaturalFuzz achieves an additional 19.9% coverage and detects 1.9× more faults than a machine learning-based synthetic data generator (SDV) when generating comparably sized inputs. This is because an ML-based synthetic data generator does not consider which code branches are exercised by which input rows from which tables, while NaturalFuzz is able to select input rows that have a high potential to increase code coverage and mutate the selected data towards unseen, new program behavior. NaturalFuzz’s test data is more realistic than the test data generated by two baseline fuzzers (BigFuzz and Jazzer), while increasing code coverage and fault detection potential. NaturalFuzz is the first fuzzing methodology with three benefits: (1) exclusively generate natural inputs, (2) fuzz multiple input sources simultaneously, and (3) find deeper semantics faults.
Presentation (AlmostFinalPresenatation7.pptx) | 5.14MiB |
Thu 14 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
15:30 - 17:00 | FuzzingNIER Track / Journal-first Papers / Research Papers / Tool Demonstrations at Plenary Room 2 Chair(s): Lars Grunske Humboldt-Universität zu Berlin | ||
15:30 12mTalk | Fine-Grained Coverage-Based Fuzzing Journal-first Papers Wei-Cheng Wu University of Southern California, USA, Bernard Nongpoh CEA LIST, University Paris-Saclay, Marwan Nour CEA, LIST, Université Paris Saclay, Michaël Marcozzi CEA, LIST, Université Paris Saclay, Sébastien Bardin CEA LIST, University Paris-Saclay, Christophe Hauser Dartmouth College Link to publication File Attached | ||
15:42 12mTalk | MLIRSmith: Random Program Generation for Fuzzing MLIR Compiler Infrastructure Research Papers Haoyu Wang College of Intelligence and Computing, Tianjin University, Junjie Chen Tianjin University, Chuyue Xie College of Intelligence and Computing, Tianjin University, Shuang Liu Tianjin University, Zan Wang Tianjin University, Qingchao Shen Tianjin University, Yingquan Zhao Tianjin University Pre-print File Attached | ||
15:54 12mTalk | Thunderkaller: Profiling and Improving the Performance of Syzkaller Research Papers Yang Lan Institute for Network Science and Cyberspace of Tsinghua University, Di Jin Brown University, Zhun Wang Institute for Network Science and Cyberspace of Tsinghua University, Wende Tan Tsinghua University, Zheyu Ma Tsinghua University, Chao Zhang Tsinghua University File Attached | ||
16:06 12mTalk | PHYFU: Fuzzing Modern Physics Simulation Engines Research Papers Dongwei Xiao Hong Kong University of Science and Technology, Zhibo Liu Hong Kong University of Science and Technology, Shuai Wang Hong Kong University of Science and Technology Link to publication DOI | ||
16:18 12mTalk | NaturalFuzz: Natural Input Generation for Big Data Analytics Research Papers Ahmad Humayun Virginia Tech, Yaoxuan Wu UCLA, Miryung Kim University of California at Los Angeles, USA, Muhammad Ali Gulzar Virginia Tech File Attached | ||
16:30 12mTalk | SpecFuzzer: A Tool for Inferring Class Specifications via Grammar-based Fuzzing Tool Demonstrations Facundo Molina IMDEA Software Institute, Marcelo d'Amorim North Carolina State University, Nazareno Aguirre University of Rio Cuarto and CONICET, Argentina Pre-print Media Attached File Attached | ||
16:42 12mTalk | Scalable Industrial Control System Analysis via XAI-based Gray-Box Fuzzing NIER Track Justin Kur Oakland University, Jingshu Chen Oakland University, Jun Huang City University of Hong Kong |