A Comprehensive Study of Bug Characteristics on Foundation Language Models
General Large Language Models (LLMs), such as ChatGPT, have achieved prominent advances in various tasks. In parallel, to better adapt to specific domains, researchers propose multiple domain-specific LLMs by fine-tuning open-source foundation language models (FLMs) (e.g., LLaMA and T5), with immense domain-specific corpora. However, like many other development processes, practitioners may also encounter various bugs. The nature of these bugs, such as their underlying causes, their fixing patterns, and their symptoms remain unclear. Understanding the characteristics of bugs during the utilization of FLMs can help understand, locate, and fix FLM-related bugs, which is a fundamental step towards enhancing the quality of FLMs and LLMs fine-tuned upon FLMs. Hence, in this work, we collected 469 real bugs from seven FLMs (T5, OPT, LLaMa, GLM, GPT-NeoX, GPT-J, and Pythia) in GitHub and Stack Overflow. We then manually labeled their root causes, fixing patterns, and symptoms, and also explored the relationship between root causes and fixing patterns. Consequently, we derived 7 significant findings and some key findings include: ❶ the majority of bugs are caused by Dependency-related, API-related, and Code-related issues; ❷ the top three most common fixing patterns in FLM bugs are Update Outdated Dependencies, Modify Configuration, and Add/Remove/Modify Conditional Expression; ❸ most root causes can have multiple fixing patterns, among them, most bugs caused by API Misuse or API Change can be resolved by Modify Parameter and Replace API; ❹ the majority of these bugs lead to the complete crash of programs. Based on these findings, we shed light on the characteristics of bugs practitioners encountered while utilizing FLMs, pave future research directions, and provide practical suggestions to FLM developers, users, and software engineering researchers.
Mon 28 AprDisplayed time zone: Eastern Time (US & Canada) change
16:00 - 17:30 | FORGE2025 Tutorial & Session5: FM EvaluationKeynotes / Tutorials / Research Papers at 207 Chair(s): Xin Xia Huawei | ||
16:00 12mLong-paper | Cyber-Attack Detection and Localization for SCADA system of CPSs Research Papers Dan Li Sun Yat-sen University, Junnan Tang Sun Yat-Sen University, Shunyu Wu Sun Yat-Sen University, Zibin Zheng Sun Yat-sen University, See-Kiong Ng National University of Singapore | ||
16:12 12mLong-paper | A Comprehensive Study of Bug Characteristics on Foundation Language Models Research Papers Junxiao Han Hangzhou City University, Guanqi Wang Zhejiang University, Jiakun Liu Singapore Management University, Lingfeng Bao Zhejiang University, Xing Hu Zhejiang University, Jinling Wei Hangzhou City University, Shuiguang Deng Zhejiang University; Alibaba-Zhejiang University Joint Institute of Frontier Technologies | ||
16:24 12mLong-paper | Testing Refactoring Engine via Historical Bug Report driven LLM Research Papers Haibo Wang Concordia University, Zhuolin Xu Concordia University, Shin Hwei Tan Concordia University Pre-print | ||
16:36 45mTutorial | Beyond Code Generation: Evaluating and Improving LLMs for Code Intelligence Tutorials Fatemeh Hendijani Fard Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus | ||
17:21 9mKeynote | Industry Keynote: Enhancing Software Engineering with Large Language Models: Insights, Challenges, and Future Directions Keynotes Dong Qiu Waterloo Research Center, Huawei Canada |