A Comprehensive Study of Bug Characteristics on Foundation Language Models (FORGE 2025 - Research Papers)

Who

Junxiao Han, Guanqi Wang, Jiakun Liu, Lingfeng Bao , Xing Hu, Jinling Wei, Shuiguang Deng

Track

FORGE 2025 Research Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 28 Apr 2025 16:12 - 16:24 at 207 - FORGE2025 Tutorial & Session5: FM Evaluation Chair(s): Xin Xia

Abstract

General Large Language Models (LLMs), such as ChatGPT, have achieved prominent advances in various tasks. In parallel, to better adapt to specific domains, researchers propose multiple domain-specific LLMs by fine-tuning open-source foundation language models (FLMs) (e.g., LLaMA and T5), with immense domain-specific corpora. However, like many other development processes, practitioners may also encounter various bugs. The nature of these bugs, such as their underlying causes, their fixing patterns, and their symptoms remain unclear. Understanding the characteristics of bugs during the utilization of FLMs can help understand, locate, and fix FLM-related bugs, which is a fundamental step towards enhancing the quality of FLMs and LLMs fine-tuned upon FLMs. Hence, in this work, we collected 469 real bugs from seven FLMs (T5, OPT, LLaMa, GLM, GPT-NeoX, GPT-J, and Pythia) in GitHub and Stack Overflow. We then manually labeled their root causes, fixing patterns, and symptoms, and also explored the relationship between root causes and fixing patterns. Consequently, we derived 7 significant findings and some key findings include: ❶ the majority of bugs are caused by Dependency-related, API-related, and Code-related issues; ❷ the top three most common fixing patterns in FLM bugs are Update Outdated Dependencies, Modify Configuration, and Add/Remove/Modify Conditional Expression; ❸ most root causes can have multiple fixing patterns, among them, most bugs caused by API Misuse or API Change can be resolved by Modify Parameter and Replace API; ❹ the majority of these bugs lead to the complete crash of programs. Based on these findings, we shed light on the characteristics of bugs practitioners encountered while utilizing FLMs, pave future research directions, and provide practical suggestions to FLM developers, users, and software engineering researchers.

Junxiao Han

Hangzhou City University

China

Guanqi Wang

Zhejiang University

Jiakun Liu

Singapore Management University

Lingfeng Bao

Zhejiang University

China

Xing Hu

Zhejiang University

China

Jinling Wei

Hangzhou City University

Shuiguang Deng

Zhejiang University; Alibaba-Zhejiang University Joint Institute of Frontier Technologies

China

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 28 Apr
Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30	FORGE2025 Tutorial & Session5: FM EvaluationKeynotes / Tutorials / Research Papers at 207 Chair(s): Xin Xia Huawei

16:00 12m Long-paper		Cyber-Attack Detection and Localization for SCADA system of CPSs Research Papers Dan Li Sun Yat-sen University, Junnan Tang Sun Yat-Sen University, Shunyu Wu Sun Yat-Sen University, Zibin Zheng Sun Yat-sen University, See-Kiong Ng National University of Singapore
16:12 12m Long-paper		A Comprehensive Study of Bug Characteristics on Foundation Language Models Research Papers Junxiao Han Hangzhou City University, Guanqi Wang Zhejiang University, Jiakun Liu Singapore Management University, Lingfeng Bao Zhejiang University, Xing Hu Zhejiang University, Jinling Wei Hangzhou City University, Shuiguang Deng Zhejiang University; Alibaba-Zhejiang University Joint Institute of Frontier Technologies
16:24 12m Long-paper		Testing Refactoring Engine via Historical Bug Report driven LLM Research Papers Haibo Wang Concordia University, Zhuolin Xu Concordia University, Shin Hwei Tan Concordia University Pre-print
16:36 45m Tutorial		Beyond Code Generation: Evaluating and Improving LLMs for Code Intelligence Tutorials Fatemeh Hendijani Fard Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus
17:21 9m Keynote		Industry Keynote: Enhancing Software Engineering with Large Language Models: Insights, Challenges, and Future Directions Keynotes Dong Qiu Waterloo Research Center, Huawei Canada