ICSE 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada

Deep learning (DL) libraries are widely used to form the basis of various AI applications in computer vision, natural language processing, and software engineering domains. Despite their popularity, DL libraries are known to have vulnerabilities, such as buffer overflows, use-after-free, and integer overflows, that can be exploited to compromise the security or effectiveness of the underlying libraries. While traditional fuzzing techniques have been used to find bugs in software, they are not well-suited for DL libraries. In general, the complexity of DL libraries and the diversity of their APIs make it challenging to test them thoroughly. To date, mainstream DL libraries like TensorFlow and PyTorch have featured over 1,000 APIs, and the number of APIs is still growing. Fuzzing all these APIs is a daunting task, especially when considering the complexity of the input data and the diversity of the API usage patterns.

Recent advances in large language models (LLMs) have illustrated the high potential of LLMs in understanding and synthesizing human-like code. Despite their high potential, we find that emerging LLM-based fuzzers are less optimal for DL library API fuzzing, given their lack of in-depth knowledge on API input edge cases and inefficiency in generating test inputs. In this paper, we propose DFUZZ, a LLM-driven DL library fuzzing approach. We have two key insights: (1) With high reasoning ability, LLMs can replace human experts to reason edge cases (likely error-triggering inputs) from checks in an API’s code, and transfer the extracted knowledge to test other (new or rarely-tested) APIs. (2) With high generation ability, LLMs can synthesize initial test programs with high accuracy that automates API testing. DFUZZ provides LLMs with a novel ‘‘white-box view’’ of DL library APIs, and therefore, can leverage LLMs’ reasoning and generation abilities to achieve comprehensive fuzzing. Our experimental results on popular DL libraries demonstrate that DFUZZ is able to cover more APIs than SOTA (LLM-based) fuzzers on TensorFlow and PyTorch, respectively. Moreover, DFUZZ successfully detected 37 bugs, with 17 already confirmed as previously unknown bugs.

Thu 1 May

Displayed time zone: Eastern Time (US & Canada) change

10:30 - 11:00
10:30
30m
Poster
Pattern-based Generation and Adaptation of Quantum WorkflowsQuantum
Research Track
Martin Beisel Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Johanna Barzen University of Stuttgart, Frank Leymann University of Stuttgart, Lavinia Stiliadou Institute of Architecture of Application Systems (IAAS), University of Stuttgart, Daniel Vietz University of Stuttgart, Benjamin Weder Institute of Architecture of Application Systems (IAAS), University of Stuttgart
10:30
30m
Talk
A Unit Proofing Framework for Code-level Verification: A Research AgendaFormal Methods
New Ideas and Emerging Results (NIER)
Paschal Amusuo Purdue University, Parth Vinod Patil Purdue University, Owen Cochell Michigan State University, Taylor Le Lievre Purdue University, James C. Davis Purdue University
Pre-print
10:30
30m
Talk
SolSearch: An LLM-Driven Framework for Efficient SAT-Solving Code GenerationFormal Methods
New Ideas and Emerging Results (NIER)
Junjie Sheng East China Normal University, Yanqiu Lin East China Normal University, Jiehao Wu East China Normal University, Yanhong Huang East China Normal University, Jianqi Shi East China Normal University, Min Zhang East China Normal University, Xiangfeng Wang East China Normal University
10:30
30m
Talk
Listening to the Firehose: Sonifying Z3’s BehaviorArtifact-FunctionalArtifact-ReusableArtifact-AvailableFormal Methods
New Ideas and Emerging Results (NIER)
Finn Hackett University of British Columbia, Ivan Beschastnikh University of British Columbia
10:30
30m
Poster
HyperCRX 2.0: A Comprehensive and Automated Tool for Empowering GitHub Insights
Demonstrations
Yantong Wang East China Normal University, Shengyu Zhao Tongji University, will wang , Fenglin Bi East China Normal University
10:30
30m
Talk
Using ML filters to help automated vulnerability repairs: when it helps and when it doesn’tSecurity
New Ideas and Emerging Results (NIER)
Maria Camporese University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam
Pre-print
10:30
30m
Talk
Automated Testing Linguistic Capabilities of NLP Models
Journal-first Papers
Jaeseong Lee The University of Texas at Dallas, Simin Chen University of Texas at Dallas, Austin Mordahl University of Illinois Chicago, Cong Liu University of California, Riverside, Wei Yang UT Dallas, Shiyi Wei University of Texas at Dallas
10:30
30m
Poster
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models
Research Track
Kunpeng Zhang The Hong Kong University of Science and Technology, Shuai Wang Hong Kong University of Science and Technology, Jitao Han Central University of Finance and Economics, Xiaogang Zhu The University of Adelaide, Xian Li Swinburne University of Technology, Shaohua Wang Central University of Finance and Economics, Sheng Wen Swinburne University of Technology

Fri 2 May

Displayed time zone: Eastern Time (US & Canada) change

13:00 - 13:30
13:00
30m
Talk
Strategies to Embed Human Values in Mobile Apps: What do End-Users and Practitioners Think?
SE in Society (SEIS)
Rifat Ara Shams CSIRO's Data61, Mojtaba Shahin RMIT University, Gillian Oliver Monash University, Jon Whittle CSIRO's Data61 and Monash University, Waqar Hussain Data61, CSIRO, Harsha Perera CSIRO's Data61, Arif Nurwidyantoro Universitas Gadjah Mada
13:00
30m
Talk
Best ends by the best means: ethical concerns in app reviews
Journal-first Papers
Neelam Tjikhoeri Vrije Universiteit Amsterdam, Lauren Olson Vrije Universiteit Amsterdam, Emitzá Guzmán Vrije Universiteit Amsterdam
13:00
30m
Poster
HyperCRX 2.0: A Comprehensive and Automated Tool for Empowering GitHub Insights
Demonstrations
Yantong Wang East China Normal University, Shengyu Zhao Tongji University, will wang , Fenglin Bi East China Normal University
13:00
30m
Poster
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models
Research Track
Kunpeng Zhang The Hong Kong University of Science and Technology, Shuai Wang Hong Kong University of Science and Technology, Jitao Han Central University of Finance and Economics, Xiaogang Zhu The University of Adelaide, Xian Li Swinburne University of Technology, Shaohua Wang Central University of Finance and Economics, Sheng Wen Swinburne University of Technology
13:00
30m
Talk
Using ML filters to help automated vulnerability repairs: when it helps and when it doesn’tSecurity
New Ideas and Emerging Results (NIER)
Maria Camporese University of Trento, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam
Pre-print
13:00
30m
Talk
Shaken, Not Stirred. How Developers Like Their Amplified Tests
Journal-first Papers
Carolin Brandt TU Delft, Ali Khatami Delft University of Technology, Mairieli Wessel Radboud University, Andy Zaidman TU Delft
Pre-print
13:00
30m
Talk
Exploring User Privacy Awareness on GitHub: An Empirical Study
Journal-first Papers
Costanza Alfieri Università degli Studi dell'Aquila, Juri Di Rocco University of L'Aquila, Paola Inverardi Gran Sasso Science Institute, Phuong T. Nguyen University of L’Aquila