MORepair: Teaching LLMs to Repair Code via Multi-Objective Fine-Tuning
Program repair requires reasoning about why a change is correct, not only recognizing edit patterns. Fine-tuning approaches proposed in the literature for LLMs on program repair tasks generally overlook the need to reason about the logic behind code changes, beyond syntactic patterns in the data. High-performing fine-tuning experiments also typically incur very high computational costs. With MORepair, we propose a novel perspective on the learning focus of LLM fine-tuning for program repair: we not only adapt the LLM parameters to the syntactic nuances of the task of code transformation (objective ①), but we also specifically fine-tune the LLM with respect to the logical reason behind the code change in the training data (objective ②). Such a multi-objective fine-tuning will instruct LLMs to generate high-quality patches. The workflow of MORepair consists of three phases: preparing guidance from paired buggy and fixed code, fine-tuning lightweight adapters with QLoRA under a joint loss, and inference that produces candidate patches verified by tests.
We apply MORepair to fine-tune four open-source LLMs with different sizes and architectures. Experimental results on function-level and repository-level repair benchmarks demonstrate that the implemented fine-tuning effectively enhances LLM repair performance by 11.4% to 56.0%. We further demonstrate that our fine-tuning strategy achieves superior performance compared to state-of-the-art approaches, including standard fine-tuning, Fine-tune-CoT, and RepairLLaMA.
Overall, this paper makes the following contributions:
[Approach]: MORepair uses multi-objective fine-tuning to couple patch generation with guidance that captures the repair rationale, enabling higher quality patches.
[Benchmarks]: We build EvalRepair-C++ and EvalRepair-Java with 164 and 163 items, derived from HumanEval-X and HumanEval-Java, plus augmented tests to reduce patch overfitting. We also provide D4J-Repair (371 Java bugs derived from Defects4J) and SWE-Repair (204 Python bugs derived from SWE-bench).
[Experiments and insights]: Across models and languages, MORepair consistently exceeds prior methods. Ablations show each single objective is suboptimal; the joint objective yields the best accuracy and more logically consistent fixes.
[Artifacts]: Our research artifacts, including code and the reproduction data, are publicly available at: https://github.com/buaabarty/morepair.