ICSE 2024
Fri 12 - Sun 21 April 2024 Lisbon, Portugal

Bugs in Deep Learning (DL) libraries may affect almost all downstream DL applications, and it is crucial to ensure the quality of such systems. It is challenging to generate valid input programs for fuzzing DL libraries, since the input programs need to satisfy both the syntax/semantics of the supported languages (e.g., Python) and the tensor/operator constraints for constructing valid computational graphs. Recently, the TitanFuzz work demonstrates that modern Large Language Models (LLMs) can be directly leveraged to implicitly learn all the language and DL computation constraints to generate valid programs for fuzzing DL libraries. However, LLMs tend to generate ordinary programs following similar patterns/tokens with typical programs seen in their massive training corpora (e.g., GitHub), while fuzzing favors unusual inputs that cover edge cases or are unlikely to be manually produced. To fill this gap, this paper proposes AtlasFuzz, the first technique to prime LLMs to synthesize unusual programs for fuzzing. AtlasFuzz is built on the well-known hypothesis that historical bug-triggering programs may include rare/valuable code ingredients important for bug finding. Meanwhile, while traditional techniques leveraging such historical information require intensive human efforts to both design dedicated generators and ensure the syntactic/semantic validity of generated programs, AtlasFuzz demonstrates that this process can be fully automated via the intrinsic capabilities of LLMs (including fine-tuning and in-context learning), while being generalizable and applicable to challenging domains. Moreover, AtlasFuzz also shows the potential of directly leveraging the instruct-following capability of the recent ChatGPT for effective fuzzing. The experimental study on two popular DL libraries (PyTorch and TensorFlow) shows that AtlasFuzz can substantially outperform TitanFuzz, detecting 76 bugs, including 48 already confirmed as previously unknown bugs.

Fri 19 Apr

Displayed time zone: Lisbon change

14:00 - 15:30
Testing with and for AI 2Journal-first Papers / Research Track / Demonstrations at Sophia de Mello Breyner Andresen
Chair(s): João Pascoal Faria Faculty of Engineering, University of Porto and INESC TEC
14:00
15m
Talk
Large Language Models are Edge-Case Generators: Crafting Unusual Programs for Fuzzing Deep Learning Libraries
Research Track
Yinlin Deng University of Illinois at Urbana-Champaign, Chunqiu Steven Xia University of Illinois at Urbana-Champaign, Chenyuan Yang University of Illinois at Urbana-Champaign, Shizhuo Zhang University of Illinois Urbana-Champaign, Shujing Yang University of Illinois Urbana-Champaign, Lingming Zhang University of Illinois at Urbana-Champaign
14:15
15m
Talk
Deeply Reinforcing Android GUI Testing with Deep Reinforcement Learning
Research Track
Yuanhong Lan Nanjing University, Yifei Lu Nanjing University, Zhong Li , Minxue Pan Nanjing University, Wenhua Yang Nanjing University of Aeronautics and Astronautics, Tian Zhang Nanjing University, Xuandong Li Nanjing University
14:30
7m
Talk
Black-Box Testing of Deep Neural Networks through Test Case Diversity
Journal-first Papers
Zohreh Aghababaeyan University of Ottawa Ottawa, Ontario, Canada, Manel Abdellatif Software and Information Technology Engineering Department, École de Technologie Supérieure, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland, Ramesh S , Mojtaba Bagherzadeh Cisco
14:37
7m
Talk
scenoRITA: Generating Diverse, Fully Mutable, Test Scenarios for Autonomous Vehicle Planning
Journal-first Papers
Yuqi Huai University of California, Irvine, Sumaya Almanee University of California, Irvine, Yuntianyi Chen University of California, Irvine, Xiafa Wu University of California, Irvine, Qi Alfred Chen University of California, Irvine, Joshua Garcia University of California, Irvine
14:44
7m
Talk
InterEvo-TR: Interactive Evolutionary Test Generation with Readability Assessment
Journal-first Papers
Pedro Delgado-Pérez Universidad de Cádiz, Aurora Ramírez University of Córdoba, Kevin Jesús Valle-Gómez Universidad de Cádiz, Inmaculada Medina-Bulo Universidad de Cádiz, José Raúl Romero University of Cordoba, Spain
14:51
7m
Talk
Differential testing for machine learning: an analysis for classification algorithms beyond deep learning
Journal-first Papers
Steffen Herbold University of Passau, Steffen Tunkel None
14:58
7m
Talk
Journal First Article: "Syntactic Vs. Semantic similarity of Artificial and Real Faults in Mutation Testing Studies"
Journal-first Papers
Milos Ojdanic University of Luxembourg, Aayush Garg Luxembourg Institute of Science and Technology, Ahmed Khanfir University of Luxembourg, Renzo Degiovanni Luxembourg Institute of Science and Technology, Mike Papadakis University of Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg
15:05
7m
Talk
Causality-driven Testing of Autonomous Driving Systems
Journal-first Papers
Luca Giamattei Università di Napoli Federico II, Antonio Guerriero Università di Napoli Federico II, Roberto Pietrantuono Università di Napoli Federico II, Stefano Russo Università di Napoli Federico II
15:12
7m
Talk
When Less is More: On the Value of ''Co-training'' for Semi-Supervised Software Defect Predictors
Journal-first Papers
Suvodeep Majumder North Carolina State University, Joymallya Chakraborty Amazon.com, Tim Menzies North Carolina State University
Pre-print
15:19
7m
Talk
OpenSBT: A Modular Framework for Search-based Testing of Automated Driving Systems
Demonstrations
Lev Sorokin fortiss, Tiziano Munaro fortiss, Damir Safin fortiss, Brian Hsuan-Cheng Liao DENSO AUTOMOTIVE, Adam Molin DENSO AUTOMOTIVE