ASE 2025
Sun 16 - Thu 20 November 2025 Seoul, South Korea

Large Language Models (LLMs) have rapidly gained popularity, transforming research and industry. To support their adoption, LLM agent workflow orchestration frameworks (hereinafter referred to as LLM agent frameworks) like LangChain have become essential for building advanced applications. However, their complexity makes bugs inevitable, and these bugs can propagate to downstream applications, causing severe failures or unintended behaviors. In this paper, we first present an abstraction of the structure of mainstream LLM agent frameworks, identifying four key architectural components: data preprocessing, core schema, agent construction, and featured modules. Building on this abstraction, we conduct the first empirical study on LLM agent framework bugs, analyzing 1,026 bug instances extracted from 1,577 real-world bug-related GitHub pull requests (PRs) from three popular LLM agent frameworks: LangChain, LlamaIndex, and Haystack. For each bug, we examine its root cause, symptom, and structural component, providing a systematic taxonomy of nine root causes and six symptom categories. Finally, leveraging the framework structure abstraction and the large-scale empirical study, we perform detailed statistical analysis in terms of the distribution of bugs in different frameworks, the distribution across different framework components, and the relationship between root cause and symptom. The analysis reveals unique challenge patterns compared to traditional software, providing actionable guidance for practitioners on quality assurance.