FORGE 2025
Sun 27 - Mon 28 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025
Mon 28 Apr 2025 16:12 - 16:24 at 207 - FORGE2025 Tutorial & Session5: FM Evaluation Chair(s): Xin Xia

General Large Language Models (LLMs), such as ChatGPT, have achieved prominent advances in various tasks. In parallel, to better adapt to specific domains, researchers propose multiple domain-specific LLMs by fine-tuning open-source foundation language models (FLMs) (e.g., LLaMA and T5), with immense domain-specific corpora. However, like many other development processes, practitioners may also encounter various bugs. The nature of these bugs, such as their underlying causes, their fixing patterns, and their symptoms remain unclear. Understanding the characteristics of bugs during the utilization of FLMs can help understand, locate, and fix FLM-related bugs, which is a fundamental step towards enhancing the quality of FLMs and LLMs fine-tuned upon FLMs. Hence, in this work, we collected 469 real bugs from seven FLMs (T5, OPT, LLaMa, GLM, GPT-NeoX, GPT-J, and Pythia) in GitHub and Stack Overflow. We then manually labeled their root causes, fixing patterns, and symptoms, and also explored the relationship between root causes and fixing patterns. Consequently, we derived 7 significant findings and some key findings include: ❶ the majority of bugs are caused by Dependency-related, API-related, and Code-related issues; ❷ the top three most common fixing patterns in FLM bugs are Update Outdated Dependencies, Modify Configuration, and Add/Remove/Modify Conditional Expression; ❸ most root causes can have multiple fixing patterns, among them, most bugs caused by API Misuse or API Change can be resolved by Modify Parameter and Replace API; ❹ the majority of these bugs lead to the complete crash of programs. Based on these findings, we shed light on the characteristics of bugs practitioners encountered while utilizing FLMs, pave future research directions, and provide practical suggestions to FLM developers, users, and software engineering researchers.

Mon 28 Apr

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30
FORGE2025 Tutorial & Session5: FM EvaluationKeynotes / Tutorials / Research Papers at 207
Chair(s): Xin Xia Huawei
16:00
12m
Long-paper
Cyber-Attack Detection and Localization for SCADA system of CPSs
Research Papers
Dan Li Sun Yat-sen University, Junnan Tang Sun Yat-Sen University, Shunyu Wu Sun Yat-Sen University, Zibin Zheng Sun Yat-sen University, See-Kiong Ng National University of Singapore
16:12
12m
Long-paper
A Comprehensive Study of Bug Characteristics on Foundation Language Models
Research Papers
Junxiao Han Hangzhou City University, Guanqi Wang Zhejiang University, Jiakun Liu Singapore Management University, Lingfeng Bao Zhejiang University, Xing Hu Zhejiang University, Jinling Wei Hangzhou City University, Shuiguang Deng Zhejiang University; Alibaba-Zhejiang University Joint Institute of Frontier Technologies
16:24
12m
Long-paper
Testing Refactoring Engine via Historical Bug Report driven LLM
Research Papers
Haibo Wang Concordia University, Zhuolin Xu Concordia University, Shin Hwei Tan Concordia University
Pre-print
16:36
45m
Tutorial
Beyond Code Generation: Evaluating and Improving LLMs for Code Intelligence
Tutorials
Fatemeh Hendijani Fard Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus
17:21
9m
Keynote
Industry Keynote: Enhancing Software Engineering with Large Language Models: Insights, Challenges, and Future Directions
Keynotes
Dong Qiu Waterloo Research Center, Huawei Canada
:
:
:
: