TraceCoder: A Trace-Driven Multi-Agent Framework for Automated Debugging of LLM-Generated Code
Large Language Models (LLMs) often generate code with subtle yet critical bugs, particularly for complex tasks. Existing automated methods for repairing LLM-generated code are limited by their reliance on superficial outcomes, such as simple pass/fail results. This \enquote{black-box} approach offers little insight into the program’s internal dynamics, hindering precise error localization. Furthermore, the absence of a mechanism to learn from past failures leads to inefficient repair cycles that often repeat the same mistakes. To address these limitations, we introduce TraceCoder, a collaborative multi-agent framework that mimics the observe-analyze-repair process of human experts. The framework first instruments the code with diagnostic print statements to capture fine-grained runtime traces, providing deep visibility into its internal execution. It then conducts causal analysis on these traces to accurately identify the root cause of the error. This process is further enhanced by a novel Historical Lesson Learning Mechanism, which distills insights from prior failed repair attempts to inform subsequent correction strategies and prevent recurrence of similar mistakes. To ensure stable convergence, a Rollback Mechanism enforces that each repair iteration constitutes a strict improvement toward the correct solution. Comprehensive empirical evaluations demonstrate that TraceCoder achieves up to a 34.43% relative improvement in Pass@1 accuracy over state-of-the-art baselines. Ablation studies verify the significance of each system component, with the iterative repair process alone contributing a 65.61% relative gain in accuracy. Furthermore, TraceCoder significantly outperforms leading iterative methods in terms of both accuracy and cost-efficiency.