Exploring the Potential of Large Language Models in Automatic Pull Request Title Generation: An Empirical Study
Pull Requests (PRs) are a collaborative mechanism in GitHub, allowing developers to merge their code changes into another branch of the software repository. The PR title serves as a summary of the PR and needs to accurately and concisely describe the specific changes made, which is useful for reviewers and other developers to review and understand. There are many existing methods for automatically generating PR titles, most of which are based on pre-trained models. Although these methods are effective, pre-trained models often require extensive fine-tuning for specific tasks.Compared to pre-trained models, large language models (LLMs) possess superior semantic understanding capabilities. As a foundational model, they can solve most tasks directly without relying on fine-tuning, providing an alternative solution for PR title generation. However, the capabilities of LLMs in the automatic PR title generation have not been fully explored. To fill this gap, we conducted an empirical study to understand the capabilities of LLMs in PR title generation. Initially, the direct application of LLMs to generate PR titles did not yield satisfactory results. We found that using similar PRs from the dataset as auxiliary information can effectively enhance the title generation capability of LLMs. When the number of most similar PRs used as input increased from 0 to 5, the ROUGE-L F1 score of the titles generated by LLMs increased by an average of 23.48%, with improvements in other metrics as well. In further experiments, we discovered that setting a lower temperature for the LLMs can bring better performance. We then selected the best parameter configuration and compared it with the existing state-of-the-art methods. Our experimental results show that LLMs outperform the state of the art methods in Precision, Recall, and METEOR metrics on the PRTiger dataset. Additionally, human evaluation results indicate that PR titles generated by LLMs receive higher scores in Correctness, Naturalness, and Comprehensibility.
Thu 5 DecDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
14:00 - 15:30 | Session (8)Technical Track at Room 1 (Zunhui Room) Chair(s): Zhou Yang Singapore Management University | ||
14:00 30mTalk | DupLLM: Duplicate Pull Requests Detection Based on Large Language Model Technical Track Zhifang Liao Central South University, Pei Liu Monash University, Peng Lan School of Computer Science and Engineering, Central South University, Changsha, China, Ke Sun Central South University | ||
14:30 30mTalk | Exploring the Potential of Large Language Models in Automatic Pull Request Title Generation: An Empirical Study Technical Track YiTao Zuo School of Computer Science and Engineering, Central South University, Changsha, China, Peng Lan School of Computer Science and Engineering, Central South University, Changsha, China, Zhifang Liao Central South University | ||
15:00 30mTalk | ModelCS: A Two-Stage Framework for Model Search Technical Track Lingjun Zhao National University of Defense Technology, Zhouyang Jia National University of Defense Technology, Jiaying Li National University of Defense Technology, Haoran Liu National University of Defense Technology, Linxiao Bai National University of Defense Technology, Shanshan Li National University of Defense Technology |