MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution (ASE 2025 - Research Papers)

Who

Yibo Wang, Zhihao Peng, Ying Wang, Zhao Wei, Hai Yu, Zhiliang Zhu

Track

ASE 2025 Research Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 18 Nov 2025 15:00 - 15:10 at Grand Hall 3 - Maintenance & Evolution 1

Abstract

LLMs demonstrate strong performance in automated software engineering, particularly for code generation and issue resolution. While proprietary models like \emph{GPT-4o} achieve high benchmarks scores on \emph{SWE-bench}, their API dependence, cost, and privacy concerns limit adoption. Open-source alternatives offer transparency but underperform in complex tasks, especially sub-100B parameter models. Although quality Chain-of-Thought (CoT) data can enhance reasoning, current methods face two critical flaws: (1) weak rejection sampling reduces data quality, and (2) inadequate step validation causes error accumulation. These limitations lead to flawed reasoning chains that impair LLMs’ ability to learn reliable issue resolution.

The paper proposes \textsc{MCTS-Refine}, an enhanced Monte Carlo Tree Search (MCTS)-based algorithm that dynamically validates and optimizes intermediate reasoning steps through a rigorous rejection sampling strategy, generating high-quality CoT data to improve LLM performance in issue resolution tasks. Key innovations include: (1) augmenting MCTS with a reflection mechanism that corrects errors via rejection sampling and refinement, (2) decomposing issue resolution into three subtasks—\emph{File Localization}, \emph{Fault Localization}, and \emph{Patch Generation}—each with clear ground-truth criteria, and (3) enforcing a strict sampling protocol where intermediate outputs must exactly match verified developer patches, ensuring correctness across reasoning paths.

Experiments on \emph{SWE-bench Lite} and \emph{SWE-bench Verified} demonstrate that LLMs fine-tuned with our CoT dataset achieve substantial improvements over baselines. Notably, \emph{Qwen2.5-72B-Instruct} achieves \textcolor{black}{28.3}%(\emph{Lite}) and \textcolor{black}{35.0}%(\emph{Verified}) resolution rates, surpassing SOTA baseline \emph{SWE-Fixer-Qwen-\textbf{72B}} with the same parameter scale, which only reached \textcolor{black}{24.7}%(\emph{Lite}) and \textcolor{black}{32.8}%(\emph{Verified}). Given precise issue locations as input, our fine-tuned \emph{Qwen2.5-72B-Instruct} model achieves an impressive issue resolution rate of 43.8%(\emph{Verified}), comparable to the performance of \emph{Deepseek-v3}. We open-source our \textsc{MCTS-Refine} framework, CoT dataset, and fine-tuned models to advance research in AI-driven software engineering.

Yibo Wang

Northeastern University

Zhihao Peng

Northeastern University

Ying Wang

Northeastern University

China

Zhao Wei

Tencent

China

Hai Yu

Northeastern University, China

Zhiliang Zhu

Northeastern University, China

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 18 Nov
Displayed time zone: Seoul change

14:00 - 15:30	Maintenance & Evolution 1Research Papers / Journal-First Track at Grand Hall 3

14:00 10m Talk		Enhancing LLMs with Staged Grouping and Dehallucination for Header File Decomposition Research Papers Yue Wang Peking University, Jiaxuan Sun Peking University, Yanzhen Zou Peking University, Bing Xie Peking University
14:10 10m Research paper		Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution Research Papers Raffi Khatchadourian CUNY Hunter College, Tatiana Castro Vélez University of Puerto Rico, Rio Piedras Campus, Mehdi Bagherzadeh Oakland University, Nan Jia City University of New York (CUNY) Graduate Center, Anita Raja City University of New York (CUNY) Hunter College Pre-print Media Attached
14:20 10m Talk		An Empirical Study of Python Library Migration Using Large Language Models Research Papers Mohayeminul Islam University of Alberta, Ajay Jha North Dakota State University, May Mahmoud New York University Abu Dhabi, Ildar Akhmetov Northeastern University, Sarah Nadi New York University Abu Dhabi
14:30 10m Talk		Measuring the Impact of Predictive Models on the Software Project: A Cost, Service Time, and Risk Evaluation of a Metric-based Defect Severity Prediction Model Journal-First Track Umamaheswara Sharma B National Institute of Technology, Calicut, Ravichandra Sadam National Institute of Technology Warangal
14:40 10m Talk		Demystifying the Evolution of Neural Networks with BOM Analysis: Insights from a Large-Scale Study of 55,997 GitHub Repositories Research Papers xiaoning ren , Yuhang Ye University of Science and Technology of China, Xiongfei Wu University of Luxembourg, Yueming Wu Huazhong University of Science and Technology, Yinxing Xue Institute of AI for Industries, Chinese Academy of Sciences
14:50 10m Talk		Fact-Aligned and Template-Constrained Static Analyzer Rule Enhancement with LLMs Research Papers Zongze Jiang Huazhong University of Science and Technology, Ming Wen Huazhong University of Science and Technology, Ge Wen Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology
15:00 10m Talk		MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution Research Papers Yibo Wang Northeastern University, Zhihao Peng Northeastern University, Ying Wang Northeastern University, Zhao Wei Tencent, Hai Yu Northeastern University, China, Zhiliang Zhu Northeastern University, China
15:10 10m Talk		Software Reconfiguration in Robotics Journal-First Track Patrizio Pelliccione Gran Sasso Science Institute, L'Aquila, Italy, Sven Peldszus IT University of Copenhagen, Davide Brugali University of Bergamo, Italy, Daniel Strüber Chalmers \| University of Gothenburg / Radboud University, Thorsten Berger Ruhr University Bochum
15:20 10m Talk		CROSS2OH: Enabling Seamless Porting of C/C++ Software Libraries to OpenHarmony Research Papers Qian Zhang University of California at Riverside, Li Tsz On The Hong Kong University of Science and Technology, Ying Wang Northeastern University, Li Li Beihang University, Shing-Chi Cheung Hong Kong University of Science and Technology