RepoSim: Evaluating Prompt Strategies for Code Completion via User Behavior Simulation
Large language models (LLMs) have revolutionized code completion tasks. IDE plugins such as Copilot can generate code recommendations, saving developers significant time and effort. However, current evaluation methods for code completion are limited by their reliance on static code benchmarks, which do not consider human interactions and evolving repositories. This paper proposes RepoSim, a novel benchmark designed to evaluate code completion tasks by simulating the evolving process of repositories and incorporating user behaviors. RepoSim leverages data from an IDE plugin, by recording and replaying user behaviors to provide a realistic programming context for evaluation. This allows for the assessment of more complex prompt strategies, such as utilizing recently visited files and incorporating user editing history. Additionally, RepoSim proposes a new metric based on users’ acceptance or rejection of predictions, offering a user-centric evaluation criterion. Our preliminary evaluation demonstrates that incorporating users’ recent edit history into prompts significantly improves the quality of LLM-generated code, highlighting the importance of temporal context in code completion. RepoSim represents a significant advancement in benchmarking tools, offering a realistic and user-focused framework for evaluating code completion performance.
Thu 31 OctDisplayed time zone: Pacific Time (US & Canada) change
15:30 - 16:30 | Code completionResearch Papers / NIER Track at Compagno Chair(s): Baishakhi Ray Columbia University, New York; AWS AI Lab | ||
15:30 15mTalk | Attribution-guided Adversarial Code Prompt Generation for Code Completion Models Research Papers Xueyang Li Institute of Information Engineering, Chinese Academy of Sciences, China, Guozhu Meng Institute of Information Engineering, Chinese Academy of Sciences, Shangqing Liu Nanyang Technological University, Lu Xiang SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China, Kun Sun Institute of Information Engineering, Chinese Academy of Sciences, Kai Chen Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Xiapu Luo Hong Kong Polytechnic University, Yang Liu Nanyang Technological University | ||
15:45 15mTalk | DroidCoder: Enhanced Android Code Completion with Context-Enriched Retrieval-Augmented Generation Research Papers Xinran Yu Nanjing University, Chun Li Nanjing University, Minxue Pan Nanjing University, Xuandong Li Nanjing University | ||
16:00 15mTalk | GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph Research Papers Wei Liu Nanjing University, Ailun Yu Peking University, Daoguang Zan Institute of Software, Chinese Academy of Sciences, Bo Shen Huawei Cloud Computing Technologies Co., Ltd., Wei Zhang Peking University, Haiyan Zhao Peking University, Zhi Jin Peking University, Qianxiang Wang Huawei Technologies Co., Ltd | ||
16:15 10mTalk | RepoSim: Evaluating Prompt Strategies for Code Completion via User Behavior Simulation NIER Track Chao Peng ByteDance, Qinyun Wu Bytedance Ltd., Jiangchao Liu ByteDance, Jierui Liu ByteDance, Bo Jiang Bytedance Network Technology, Mengqian Xu East China Normal University, Yinghao Wang ByteDance, Xia Liu ByteDance, Ping Yang Bytedance Network Technology |