MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution
This program is tentative and subject to change.
LLMs demonstrate strong performance in automated software engineering, particularly for code generation and issue resolution. While proprietary models like \emph{GPT-4o} achieve high benchmarks scores on \emph{SWE-bench}, their API dependence, cost, and privacy concerns limit adoption. Open-source alternatives offer transparency but underperform in complex tasks, especially sub-100B parameter models. Although quality Chain-of-Thought (CoT) data can enhance reasoning, current methods face two critical flaws: (1) weak rejection sampling reduces data quality, and (2) inadequate step validation causes error accumulation. These limitations lead to flawed reasoning chains that impair LLMs’ ability to learn reliable issue resolution.
The paper proposes \textsc{MCTS-Refine}, an enhanced Monte Carlo Tree Search (MCTS)-based algorithm that dynamically validates and optimizes intermediate reasoning steps through a rigorous rejection sampling strategy, generating high-quality CoT data to improve LLM performance in issue resolution tasks. Key innovations include: (1) augmenting MCTS with a reflection mechanism that corrects errors via rejection sampling and refinement, (2) decomposing issue resolution into three subtasks—\emph{File Localization}, \emph{Fault Localization}, and \emph{Patch Generation}—each with clear ground-truth criteria, and (3) enforcing a strict sampling protocol where intermediate outputs must exactly match verified developer patches, ensuring correctness across reasoning paths.
Experiments on \emph{SWE-bench Lite} and \emph{SWE-bench Verified} demonstrate that LLMs fine-tuned with our CoT dataset achieve substantial improvements over baselines. Notably, \emph{Qwen2.5-72B-Instruct} achieves \textcolor{black}{28.3}%(\emph{Lite}) and \textcolor{black}{35.0}%(\emph{Verified}) resolution rates, surpassing SOTA baseline \emph{SWE-Fixer-Qwen-\textbf{72B}} with the same parameter scale, which only reached \textcolor{black}{24.7}%(\emph{Lite}) and \textcolor{black}{32.8}%(\emph{Verified}). Given precise issue locations as input, our fine-tuned \emph{Qwen2.5-72B-Instruct} model achieves an impressive issue resolution rate of 43.8%(\emph{Verified}), comparable to the performance of \emph{Deepseek-v3}. We open-source our \textsc{MCTS-Refine} framework, CoT dataset, and fine-tuned models to advance research in AI-driven software engineering.
This program is tentative and subject to change.
Tue 18 NovDisplayed time zone: Seoul change
14:00 - 15:30 | |||
14:00 10mTalk | Enhancing LLMs with Staged Grouping and Dehallucination for Header File Decomposition Research Papers Yue Wang Peking University, Jiaxuan Sun Peking University, Yanzhen Zou Peking University, Bing Xie Peking University | ||
14:10 10mResearch paper | Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution Research Papers Raffi Khatchadourian CUNY Hunter College, Tatiana Castro Vélez University of Puerto Rico, Rio Piedras Campus, Mehdi Bagherzadeh Oakland University, Nan Jia City University of New York (CUNY) Graduate Center, Anita Raja City University of New York (CUNY) Hunter College Pre-print Media Attached | ||
14:20 10mTalk | An Empirical Study of Python Library Migration Using Large Language Models Research Papers Mohayeminul Islam University of Alberta, Ajay Jha North Dakota State University, May Mahmoud New York University Abu Dhabi, Ildar Akhmetov Northeastern University, Sarah Nadi New York University Abu Dhabi | ||
14:30 10mTalk | Measuring the Impact of Predictive Models on the Software Project: A Cost, Service Time, and Risk Evaluation of a Metric-based Defect Severity Prediction Model Journal-First Track Umamaheswara Sharma B National Institute of Technology, Calicut, Ravichandra Sadam National Institute of Technology Warangal | ||
14:40 10mTalk | Demystifying the Evolution of Neural Networks with BOM Analysis: Insights from a Large-Scale Study of 55,997 GitHub Repositories Research Papers xiaoning ren , Yuhang Ye University of Science and Technology of China, Xiongfei Wu University of Luxembourg, Yueming Wu Huazhong University of Science and Technology, Yinxing Xue Institute of AI for Industries, Chinese Academy of Sciences | ||
14:50 10mTalk | Fact-Aligned and Template-Constrained Static Analyzer Rule Enhancement with LLMs Research Papers Zongze Jiang Huazhong University of Science and Technology, Ming Wen Huazhong University of Science and Technology, Ge Wen Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology | ||
15:00 10mTalk | MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution Research Papers Yibo Wang Northeastern University, Zhihao Peng Northeastern University, Ying Wang Northeastern University, Zhao Wei Tencent, Hai Yu Northeastern University, China, Zhiliang Zhu Northeastern University, China | ||
15:10 10mTalk | Software Reconfiguration in Robotics Journal-First Track Patrizio Pelliccione Gran Sasso Science Institute, L'Aquila, Italy, Sven Peldszus IT University of Copenhagen, Davide Brugali University of Bergamo, Italy, Daniel Strüber Chalmers | University of Gothenburg / Radboud University, Thorsten Berger Ruhr University Bochum | ||
15:20 10mTalk | CROSS2OH: Enabling Seamless Porting of C/C++ Software Libraries to OpenHarmony Research Papers Qian Zhang University of California at Riverside, Li Tsz On The Hong Kong University of Science and Technology, Ying Wang Northeastern University, Li Li Beihang University, Shing-Chi Cheung Hong Kong University of Science and Technology | ||