ASE 2025
Sun 16 - Thu 20 November 2025 Seoul, South Korea

This program is tentative and subject to change.

Mon 17 Nov 2025 16:50 - 17:00 at Grand Hall 2 - Bug Understanding 2

Autonomous agent systems powered by Large Language Models (LLMs) have demonstrated promising capabilities in automating complex tasks. However, current evaluations largely rely on success rates without systematically analyzing the interactions, communication mechanisms, and failure causes within these systems. To bridge this gap, we present a benchmark of 34 representative programmable tasks designed to rigorously assess autonomous agents. Using this benchmark, we evaluate three popular open-source agent frameworks combined with two LLM backbones, observing a task completion rate of approximately 50% on average. Through in-depth failure analysis, we develop a three-tier taxonomy of failure causes aligned with task phases, highlighting planning errors, task execution issues, and incorrect response generation. Based on these insights, we propose actionable improvements to enhance agent planning and self-diagnosis capabilities. Our failure taxonomy, together with mitigation advice, provides an empirical foundation for developing more robust and effective autonomous agent systems in the future.

This program is tentative and subject to change.

Mon 17 Nov

Displayed time zone: Seoul change

16:00 - 17:00
Bug Understanding 2Industry Showcase / NIER Track at Grand Hall 2
16:00
10m
Talk
A Characterization Study of Bugs in LLM Agent Workflow Orchestration Frameworks
Industry Showcase
Ziluo Xue Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Shenao Wang Huazhong University of Science and Technology, Kai Chen Huazhong University of Science and Technology, Haoyu Wang Huazhong University of Science and Technology
16:10
10m
Talk
Debugging the Undebuggable: Why Multi-Fault Programs Break Debugging and Repair Tools
NIER Track
Omar I. Al-Bataineh Gran Sasso Science Institute (GSSI)
16:20
10m
Talk
ErrorPrism: Reconstructing Error Propagation Paths in Cloud Service Systems
Industry Showcase
Junsong Pu School of Software Engineering, Sun Yat-sen University, Yichen LI ByteDance, Zhuangbin Chen Sun Yat-sen University, Jinyang Liu ByteDance, Zhihan Jiang The Chinese University of Hong Kong, Jianjun Chen Bytedance, Rui Shi Bytedance, Zibin Zheng Sun Yat-sen University, Tieying Zhang ByteDance
16:30
10m
Talk
Fault Injection for Simulink-based CPS Models: Insights and Future Directions
NIER Track
Drishti Yadav University of Luxembourg, Luxembourg, Claudio Mandrioli University of Luxembourg, Ezio Bartocci TU Wien, Domenico Bianculli University of Luxembourg
16:40
10m
Talk
How Does ChatGPT Make Assumptions When Creating Erroneous Programs?
NIER Track
Sadia Jahan University of Texas at San Antonio, Xiaoyin Wang University of Texas at San Antonio
16:50
10m
Talk
Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks
NIER Track
Ruofan Lu The Chinese University of Hong Kong, Yichen LI ByteDance, Yintong Huo Singapore Management University, Singapore