RLCoder: Reinforcement Learning for Repository-Level Code Completion
Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrieval-augmented generation strategies due to limitations in input sequence length. However, traditional lexical-based retrieval methods like BM25 struggle to capture code semantics, while model-based retrieval methods face challenges due to the lack of labeled data for training. Therefore, we propose RLCoder, a novel reinforcement learning framework, which can enable the retriever to learn to retrieve useful content for code completion without the need for labeled data. Specifically, we iteratively evaluate the usefulness of retrieved content based on the perplexity of the target code when provided with the retrieved content as additional context, and provide feedback to update the retriever parameters. This iterative process enables the retriever to learn from its successes and failures, gradually improving its ability to retrieve relevant and high-quality content. Considering that not all situations require information beyond code files and not all retrieved context is helpful for generation, we also introduce a stop signal mechanism, allowing the retriever to decide when to retrieve and which candidates to retain autonomously. Extensive experimental results demonstrate that RLCoder consistently outperforms state-of-the-art methods on CrossCodeEval and RepoEval, achieving 12.2% EM improvement over previous methods. Moreover, experiments show that our framework can generalize across different programming languages and further improve previous methods like RepoCoder.
Wed 30 AprDisplayed time zone: Eastern Time (US & Canada) change
16:00 - 17:30 | AI for SE 2Research Track / Journal-first Papers at Canada Hall 1 and 2 Chair(s): Tingting Yu University of Connecticut | ||
16:00 15mTalk | Large Language Models for Safe Minimization Research Track Aashish Yadavally University of Texas at Dallas, Xiaokai Rong The University of Texas at Dallas, Phat Nguyen The University of Texas at Dallas, Tien N. Nguyen University of Texas at Dallas | ||
16:15 15mTalk | LUNA: A Model-Based Universal Analysis Framework for Large Language Models Journal-first Papers Da Song University of Alberta, Xuan Xie University of Alberta, Jiayang Song University of Alberta, Derui Zhu Technical University of Munich, Yuheng Huang University of Alberta, Canada, Felix Juefei-Xu New York University, Lei Ma The University of Tokyo & University of Alberta, Yuheng Huang University of Alberta, Canada | ||
16:30 15mTalk | Intention is All You Need: Refining Your Code from Your Intention Research Track Qi Guo Tianjin University, Xiaofei Xie Singapore Management University, Shangqing Liu Nanyang Technological University, Ming Hu Nanyang Technological University, Xiaohong Li Tianjin University, Lei Bu Nanjing University | ||
16:45 15mTalk | RLCoder: Reinforcement Learning for Repository-Level Code Completion Research Track Yanlin Wang Sun Yat-sen University, Yanli Wang Sun Yat-sen University, Daya Guo , Jiachi Chen Sun Yat-sen University, Ruikai Zhang Huawei Cloud Computing Technologies, Yuchi Ma Huawei Cloud Computing Technologies, Zibin Zheng Sun Yat-sen University | ||
17:00 15mTalk | InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation Research Track Marcos Macedo Queen's University, Yuan Tian Queen's University, Kingston, Ontario, Pengyu Nie University of Waterloo, Filipe Cogo Centre for Software Excellence, Huawei Canada, Bram Adams Queen's University | ||
17:15 15mTalk | Toward a Theory of Causation for Interpreting Neural Code Models Journal-first Papers David Nader Palacio William & Mary, Alejandro Velasco William & Mary, Nathan Cooper William & Mary, Alvaro Rodriguez Universidad Nacional de Colombia, Kevin Moran University of Central Florida, Denys Poshyvanyk William & Mary |