A Characterization Study of Bugs in LLM Agent Workflow Orchestration Frameworks (ASE 2025 - Industry Showcase)

Sun 16 - Thu 20 November 2025 Seoul, South Korea

Who

Ziluo Xue, Yanjie Zhao, Shenao Wang, Kai Chen, Haoyu Wang

Track

ASE 2025 Industry Showcase

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 17 Nov 2025 16:00 - 16:10 at Grand Hall 2 - Bug Understanding 2

Abstract

Large Language Models (LLMs) have rapidly gained popularity, transforming research and industry. To support their adoption, LLM agent workflow orchestration frameworks (hereinafter referred to as LLM agent frameworks) like LangChain have become essential for building advanced applications. However, their complexity makes bugs inevitable, and these bugs can propagate to downstream applications, causing severe failures or unintended behaviors. In this paper, we first present an abstraction of the structure of mainstream LLM agent frameworks, identifying four key architectural components: data preprocessing, core schema, agent construction, and featured modules. Building on this abstraction, we conduct the first empirical study on LLM agent framework bugs, analyzing 1,026 bug instances extracted from 1,577 real-world bug-related GitHub pull requests (PRs) from three popular LLM agent frameworks: LangChain, LlamaIndex, and Haystack. For each bug, we examine its root cause, symptom, and structural component, providing a systematic taxonomy of nine root causes and six symptom categories. Finally, leveraging the framework structure abstraction and the large-scale empirical study, we perform detailed statistical analysis in terms of the distribution of bugs in different frameworks, the distribution across different framework components, and the relationship between root cause and symptom. The analysis reveals unique challenge patterns compared to traditional software, providing actionable guidance for practitioners on quality assurance.

Ziluo Xue

Huazhong University of Science and Technology

Yanjie Zhao

Huazhong University of Science and Technology

China

Shenao Wang

Huazhong University of Science and Technology

China

Kai Chen

Huazhong University of Science and Technology

Haoyu Wang

Huazhong University of Science and Technology

China

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 17 Nov
Displayed time zone: Seoul change

16:00 - 17:00	Bug Understanding 2Industry Showcase / NIER Track at Grand Hall 2

16:00 10m Talk		A Characterization Study of Bugs in LLM Agent Workflow Orchestration Frameworks Industry Showcase Ziluo Xue Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Shenao Wang Huazhong University of Science and Technology, Kai Chen Huazhong University of Science and Technology, Haoyu Wang Huazhong University of Science and Technology
16:10 10m Talk		Debugging the Undebuggable: Why Multi-Fault Programs Break Debugging and Repair Tools NIER Track Omar I. Al-Bataineh Gran Sasso Science Institute (GSSI)
16:20 10m Talk		ErrorPrism: Reconstructing Error Propagation Paths in Cloud Service Systems Industry Showcase Junsong Pu School of Software Engineering, Sun Yat-sen University, Yichen LI ByteDance, Zhuangbin Chen Sun Yat-sen University, Jinyang Liu ByteDance, Zhihan Jiang The Chinese University of Hong Kong, Jianjun Chen Bytedance, Rui Shi Bytedance, Zibin Zheng Sun Yat-sen University, Tieying Zhang ByteDance
16:30 10m Talk		Fault Injection for Simulink-based CPS Models: Insights and Future Directions NIER Track Drishti Yadav University of Luxembourg, Luxembourg, Claudio Mandrioli University of Luxembourg, Ezio Bartocci TU Wien, Domenico Bianculli University of Luxembourg
16:40 10m Talk		How Does ChatGPT Make Assumptions When Creating Erroneous Programs? NIER Track Sadia Jahan University of Texas at San Antonio, Xiaoyin Wang University of Texas at San Antonio
16:50 10m Talk		Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks NIER Track Ruofan Lu The Chinese University of Hong Kong, Yichen LI ByteDance, Yintong Huo Singapore Management University, Singapore