Large Language Models are Edge-Case Generators: Crafting Unusual Programs for Fuzzing Deep Learning Libraries (ICSE 2024 - Research Track)

Fri 12 - Sun 21 April 2024 Lisbon, Portugal

Who

Yinlin Deng, Chunqiu Steven Xia, Chenyuan Yang, Shizhuo Zhang, Shujing Yang, Lingming Zhang

Track

ICSE 2024 Research Track

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 19 Apr 2024 14:00 - 14:15 at Sophia de Mello Breyner Andresen - Testing with and for AI 2 Chair(s): João Pascoal Faria

Abstract

Bugs in Deep Learning (DL) libraries may affect almost all downstream DL applications, and it is crucial to ensure the quality of such systems. It is challenging to generate valid input programs for fuzzing DL libraries, since the input programs need to satisfy both the syntax/semantics of the supported languages (e.g., Python) and the tensor/operator constraints for constructing valid computational graphs. Recently, the TitanFuzz work demonstrates that modern Large Language Models (LLMs) can be directly leveraged to implicitly learn all the language and DL computation constraints to generate valid programs for fuzzing DL libraries. However, LLMs tend to generate ordinary programs following similar patterns/tokens with typical programs seen in their massive training corpora (e.g., GitHub), while fuzzing favors unusual inputs that cover edge cases or are unlikely to be manually produced. To fill this gap, this paper proposes AtlasFuzz, the first technique to prime LLMs to synthesize unusual programs for fuzzing. AtlasFuzz is built on the well-known hypothesis that historical bug-triggering programs may include rare/valuable code ingredients important for bug finding. Meanwhile, while traditional techniques leveraging such historical information require intensive human efforts to both design dedicated generators and ensure the syntactic/semantic validity of generated programs, AtlasFuzz demonstrates that this process can be fully automated via the intrinsic capabilities of LLMs (including fine-tuning and in-context learning), while being generalizable and applicable to challenging domains. Moreover, AtlasFuzz also shows the potential of directly leveraging the instruct-following capability of the recent ChatGPT for effective fuzzing. The experimental study on two popular DL libraries (PyTorch and TensorFlow) shows that AtlasFuzz can substantially outperform TitanFuzz, detecting 76 bugs, including 48 already confirmed as previously unknown bugs.

Yinlin Deng

University of Illinois at Urbana-Champaign

United States

Chunqiu Steven Xia

University of Illinois at Urbana-Champaign

United States

Chenyuan Yang

University of Illinois at Urbana-Champaign

United States

Shizhuo Zhang

University of Illinois Urbana-Champaign

Shujing Yang

University of Illinois Urbana-Champaign

Lingming Zhang

University of Illinois at Urbana-Champaign

United States

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 19 Apr
Displayed time zone: Lisbon change

14:00 - 15:30	Testing with and for AI 2Journal-first Papers / Research Track / Demonstrations at Sophia de Mello Breyner Andresen Chair(s): João Pascoal Faria Faculty of Engineering, University of Porto and INESC TEC

14:00 15m Talk		Large Language Models are Edge-Case Generators: Crafting Unusual Programs for Fuzzing Deep Learning Libraries Research Track Yinlin Deng University of Illinois at Urbana-Champaign, Chunqiu Steven Xia University of Illinois at Urbana-Champaign, Chenyuan Yang University of Illinois at Urbana-Champaign, Shizhuo Zhang University of Illinois Urbana-Champaign, Shujing Yang University of Illinois Urbana-Champaign, Lingming Zhang University of Illinois at Urbana-Champaign
14:15 15m Talk		Deeply Reinforcing Android GUI Testing with Deep Reinforcement Learning Research Track Yuanhong Lan Nanjing University, Yifei Lu Nanjing University, Zhong Li , Minxue Pan Nanjing University, Wenhua Yang Nanjing University of Aeronautics and Astronautics, Tian Zhang Nanjing University, Xuandong Li Nanjing University
14:30 7m Talk		Black-Box Testing of Deep Neural Networks through Test Case Diversity Journal-first Papers Zohreh Aghababaeyan University of Ottawa Ottawa, Ontario, Canada, Manel Abdellatif Software and Information Technology Engineering Department, École de Technologie Supérieure, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland, Ramesh S , Mojtaba Bagherzadeh Cisco
14:37 7m Talk		scenoRITA: Generating Diverse, Fully Mutable, Test Scenarios for Autonomous Vehicle Planning Journal-first Papers Yuqi Huai University of California, Irvine, Sumaya Almanee University of California, Irvine, Yuntianyi Chen University of California, Irvine, Xiafa Wu University of California, Irvine, Alfred Chen University of California, Irvine, Joshua Garcia University of California, Irvine
14:44 7m Talk		InterEvo-TR: Interactive Evolutionary Test Generation with Readability Assessment Journal-first Papers Pedro Delgado-Pérez Universidad de Cádiz, Aurora Ramírez University of Córdoba, Kevin Jesús Valle-Gómez Universidad de Cádiz, Inmaculada Medina-Bulo Universidad de Cádiz, José Raúl Romero University of Cordoba, Spain
14:51 7m Talk		Differential testing for machine learning: an analysis for classification algorithms beyond deep learning Journal-first Papers Steffen Herbold University of Passau, Steffen Tunkel None
14:58 7m Talk		Journal First Article: "Syntactic Vs. Semantic similarity of Artificial and Real Faults in Mutation Testing Studies" Journal-first Papers Milos Ojdanic University of Luxembourg, Aayush Garg Luxembourg Institute of Science and Technology, Ahmed Khanfir University of Luxembourg, Renzo Degiovanni Luxembourg Institute of Science and Technology, Mike Papadakis University of Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg
15:05 7m Talk		Causality-driven Testing of Autonomous Driving Systems Journal-first Papers Luca Giamattei Università di Napoli Federico II, Antonio Guerriero Università di Napoli Federico II, Roberto Pietrantuono Università di Napoli Federico II, Stefano Russo Università di Napoli Federico II
15:12 7m Talk		When Less is More: On the Value of ''Co-training'' for Semi-Supervised Software Defect Predictors Journal-first Papers Suvodeep Majumder North Carolina State University, Joymallya Chakraborty Amazon.com, Tim Menzies North Carolina State University Pre-print
15:19 7m Talk		OpenSBT: A Modular Framework for Search-based Testing of Automated Driving Systems Demonstrations Lev Sorokin fortiss, Tiziano Munaro fortiss, Damir Safin fortiss, Brian Hsuan-Cheng Liao DENSO AUTOMOTIVE, Adam Molin DENSO AUTOMOTIVE