Learning Input Tokens for Effective Fuzzing (ISSTA 2020 - Technical Papers)

Who

Björn Mathis, Rahul Gopinath, Andreas Zeller

Track

ISSTA 2020 Technical Papers

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 20 Jul 2020 11:30 - 11:50 at Zoom - FUZZING Chair(s): Rody Kersten

Abstract

Modern fuzzing tools like AFL operate at a lexical level: They explore the input space of tested programs one byte after another. For inputs with complex syntactical properties, this is very inefficient, as keywords and other tokens have to be composed one character at a time. Fuzzers thus allow to specify dictionaries listing possible tokens the input can be composed from; such dictionaries speed up fuzzers dramatically. Also, fuzzers make use of dynamic tainting to track input tokens and infer values that are expected in the input validation phase. Unfortunately, such tokens are usually implicitly converted to program specific values which causes a loss of the taints attached to the input data in the lexical phase.

In this paper we present a technique to extend dynamic tainting to not only track explicit data flows but also taint implicitly converted data without suffering from taint explosion. This extension makes it possible to augment existing techniques and automatically infer a set of tokens and seed inputs for the input language of a program given nothing but the source code. Specifically targeting the lexical analysis of an input processor, our lFuzzer test generator systematically explores branches of the lexical analysis, producing a set of tokens that fully cover all decisions seen. The resulting set of tokens can be directly used as a dictionary for fuzzing. Along with the token extraction seed inputs are generated which give further fuzzing processes a head start. In our experiments, the lFuzzer-AFL combination achieves up to 17% more coverage on complex input formats like JSON, LISP, tinyC, and JavaScript compared to AFL.

Link to Publication

https://dl.acm.org/doi/10.1145/3395363.3397348

DOI

https://doi.org/10.1145/3395363.3397348

Björn Mathis

CISPA Helmholtz Center for Information Security

Germany

Rahul Gopinath

CISPA Helmholtz Center for Information Security

Germany

Andreas Zeller

CISPA Helmholtz Center for Information Security

Germany

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 20 Jul
Displayed time zone: Tijuana, Baja California change

10:50 - 11:50	FUZZINGTechnical Papers at Zoom Chair(s): Rody Kersten Synopsys, Inc. Public Live Stream/Recording. Registered participants should join via the Zoom link distributed in Slack.

10:50 20m Talk		WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats Technical Papers Andrea Fioraldi Sapienza University Rome, Daniele Cono D'Elia Sapienza University of Rome, Emilio Coppa Sapienza University of Rome, Italy DOI Pre-print Media Attached
11:10 20m Talk		Active Fuzzing for Testing and Securing Cyber-Physical Systems Technical Papers Yuqi Chen Singapore Management University, Bohan Xuan , Chris Poskitt Singapore Management University, Jun Sun Singapore Management University, Fan Zhang DOI Pre-print Media Attached
11:30 20m Talk		Learning Input Tokens for Effective Fuzzing Technical Papers Björn Mathis CISPA Helmholtz Center for Information Security, Rahul Gopinath CISPA Helmholtz Center for Information Security, Andreas Zeller CISPA Helmholtz Center for Information Security Link to publication DOI