HFUZZER: Testing Large Language Models for Package Hallucinations via Phrase-based Fuzzing
This program is tentative and subject to change.
Large Language Models (LLMs) are widely used for code generation, but they face critical security risks when applied to practical production due to package hallucinations, in which LLMs recommend non-existent packages. These hallucinations can be exploited in software supply chain attacks, where malicious attackers exploit them to register harmful packages. It is critical to test LLMs for package hallucinations to mitigate package hallucinations and defend against potential attacks. Although researchers have proposed testing frameworks for fact-conflicting hallucinations in natural language generation, there is a lack of research on package hallucinations. To fill this gap, we propose HFUZZER, a novel phrase-based fuzzing framework to test LLMs for package hallucinations. HFUZZER adopts fuzzing technology and guides the model to infer a wider range of reasonable information based on phrases, thereby generating enough and diverse coding tasks. Furthermore, HFUZZER extracts phrases from package information or coding tasks to ensure the relevance of phrases and code, thereby improving the relevance of generated tasks and code. We evaluate HFUZZER on multiple LLMs and find that it triggers package hallucinations across all selected models. Compared to the mutational fuzzing framework, HFUZZER identifies 2.36× more unique hallucinated packages. Additionally, when testing the model GPT-4o, HFUZZER finds 46 unique hallucinated packages. Further analysis shows that LLMs are prone to package hallucinations not only when generating code but also when assisting with environment configuration.
This program is tentative and subject to change.
Mon 17 NovDisplayed time zone: Seoul change
11:00 - 12:30 | |||
11:00 10mTalk | TensorGuard: Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification Research Papers Zehao Wu Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Haoyu Wang Huazhong University of Science and Technology | ||
11:10 10mTalk | Root Cause Analysis of RISC-V Build Failures via LLM and MCTS Reasoning Research Papers Weipeng Shuai Institute of Software, Chinese Academy of Sciences, Jie Liu Institute of Software, Chinese Academy of Sciences, Zhirou Ma Institute of Software, Chinese Academy of Sciences, Liangyi Kang Institute of Software, Chinese Academy of Sciences, Zehua Wang Institute of Software, Chinese Academy of Sciences, Shuai Wang Institute of Software, Chinese Academy of Sciences, Dan Ye Institute of Software at Chinese Academy of Sciences, Hui Li , Wei Wang Institute of Software at Chinese Academy of Sciences, Jiaxin Zhu Institute of Software at Chinese Academy of Sciences | ||
11:20 10mTalk | An Empirical Study of Knowledge Transfer in AI Pair Programming Research Papers Alisa Carla Welter Saarland University, Niklas Schneider Saarland University, Tobias Dick Saarland University, Kallistos Weis Saarland University, Christof Tinnes Saarland University, Marvin Wyrich Saarland University, Sven Apel Saarland University | ||
11:30 10mTalk | Efficient Understanding of Machine Learning Model Mispredictions Research Papers Martin Eberlein Humboldt-Universtität zu Berlin, Jürgen Cito TU Wien, Lars Grunske Humboldt-Universität zu Berlin | ||
11:40 10mTalk | Can Mamba Be Better? An Experimental Evaluation of Mamba in Code Intelligence Research Papers Shuo Liu City University of Hong Kong, Jacky Keung City University of Hong Kong, Zhen Yang Shandong University, Zhenyu Mao City University of Hong Kong, Yicheng Sun City University of Hong Kong | ||
11:50 10mTalk | "My productivity is boosted, but ..." Demystifying Users’ Perception on AI Coding Assistants Research Papers | ||
12:00 10mTalk | HFUZZER: Testing Large Language Models for Package Hallucinations via Phrase-based Fuzzing Research Papers Yukai Zhao , Menghan Wu Zhejiang University, Xing Hu Zhejiang University, Xin Xia Zhejiang University | ||
12:10 10mTalk | Provable Fairness Repair for Deep Neural Networks Research Papers Jianan Ma Hangzhou Dianzi University, China; Zhejiang University, Hangzhou, China, Jingyi Wang Zhejiang University, Qi Xuan Zhejiang University of Technology; Binjiang Institute of Artificial Intelligence, Zhen Wang Hangzhou Dianzi University, China | ||
12:20 10mTalk | AutoAdapt: On the Application of AutoML for Parameter-Efficient Fine-Tuning of Pre-Trained Code Models Journal-First Track Amal Akli University of Luxembourg, Maxime Cordy University of Luxembourg, Luxembourg, Mike Papadakis University of Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg | ||