LinkAnchor: An Autonomous LLM-Based Agent for Issue-to-Commit Link Recovery
Issue-to-commit link recovery plays a central role in software traceability and project management, yet it remains a challenging task. Prior studies show that only about 42.2% of issues on GitHub are correctly linked to their commits, highlighting the need for more effective solutions. Existing work has explored a range of ML/DL approaches, and more recently, large language models (LLMs) have been applied to this problem. However, these methods face two major limitations. First, LLMs are restricted by limited context windows and cannot simultaneously process all available data sources, such as long commit histories, extensive issue discussions, and large code repositories. Second, most approaches operate on individual issue–commit pairs, where the model determines for each pair whether the commit is related to the issue. While straightforward in design, this strategy quickly becomes impractical in repositories with tens of thousands of commits, as it requires exhaustively evaluating an enormous number of candidate pairs. To address these challenges, we present LinkAnchor, the first autonomous LLM-based agent designed specifically for issue-to-commit link recovery. LinkAnchor introduces a lazy-access architecture that allows the underlying LLM to dynamically retrieve only the most relevant contextual data, such as commits, issue comments, and code files, without exceeding token limits. Unlike prior approaches, LinkAnchor does not exhaustively score every possible pair but instead efficiently identifies the last resolving commit of an issue, enabling the complete reconstruction of the resolving commit chain and retrieval of all relevant commits. Our evaluations show that LinkAnchor outperforms state-of-the-art baselines by 41–714% in Hit@1 across six large-scale open-source projects, while costing only about 0.01 US dollars per issue. Finally, LinkAnchor is designed and tested for both GitHub and Jira, and its modular architecture makes it straightforward to extend to other platforms.