RepoMasterEval: Evaluating Code Completion via Real-World Repositories (ASE 2025 - Industry Showcase)

Sun 16 - Thu 20 November 2025 Seoul, South Korea

Who

Qinyun Wu, Chao Peng, Pengfei Gao, Ruida Hu, Haoyu Gan, Bo Jiang, Jinhe Tang, Zhiwen Deng, Zhanming Guan, Cuiyun Gao, Xia Liu, Ping Yang

Track

ASE 2025 Industry Showcase

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 18 Nov 2025 16:30 - 16:40 at Grand Hall 1 - Code Generation 3 Chair(s): Kisub Kim

Abstract

With the growing reliance on automated code completion tools in software development, the need for comprehensive evaluation benchmarks has become critical. Existing benchmarks focus more on code completion in function and class level by providing text descriptions to prompt the model. By contrast, such descriptive prompt is commonly unavailable in real development and code completion can occur in wider range of situations such as in the middle of a function or a code block. These limitations makes existing evaluation benchmarks poorly align with the practical scenarios of code completion tools. In this paper, we propose RepoMasterEval, a novel benchmark for evaluating code completion models constructed from real-world repositories. Each benchmark datum is generated by masking a code snippet (ground truth) from one source code file with existing test suites. To improve test accuracy of model generated code, we employ mutation testing to measure the effectiveness of the test cases and we manually crafted new test cases for those test suites with low mutation score. Our empirical evaluation on 10 state-of-the-art models shows that test argumentation is critical in improving the accuracy of the benchmark and RepoMasterEval is able to report variance in model performance in real-world scenarios. The deployment of RepoMasterEval also revealed that the benchmark is useful to give accurate feedback during model training and the score is in high correlation with the model’s performance in practice.

Qinyun Wu

Bytedance Ltd.

Chao Peng

ByteDance

China

Pengfei Gao

ByteDance

Ruida Hu

Harbin Institute of Technology, Shenzhen

Haoyu Gan

ByteDance

Bo Jiang

Bytedance Network Technology

China

Jinhe Tang

ByteDance

Zhiwen Deng

ByteDance

Zhanming Guan

ByteDance

Cuiyun Gao

Harbin Institute of Technology, Shenzhen

China

Xia Liu

ByteDance

China

Ping Yang

Bytedance Network Technology

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 18 Nov
Displayed time zone: Seoul change

16:00 - 17:00	Code Generation 3NIER / Industry Showcase at Grand Hall 1 Chair(s): Kisub Kim DGIST

16:00 10m Talk		Automated Prompt Generation for Code Intelligence: An Empirical study and Experience in WeChat Industry Showcase Kexing Ji , Shiyun Fu The Chinese University of Hong Kong, Cuiyun Gao Harbin Institute of Technology, Shenzhen, Yujia Chen The Chinese University of Hong Kong, Zezhou Yang Tencent Inc., Chaozheng Wang The Chinese University of Hong Kong, Yuetang Deng Tencent
16:10 10m Talk		Evaluating Large Language Models for Functional and Maintainable Code in Industrial Settings: A Case Study at ASML Industry Showcase Yash Mundhra Delft University of Technology, Max Valk ASML, Maliheh Izadi Delft University of Technology
16:20 10m Talk		IntelliTopo: An IaC Generation Service for Industrial Network Topology Construction Industry Showcase Mingyu Shao Harbin Institute of Technology, Shenzhen; PengCheng Laboratory, Zhao Liu PengCheng Laboratory, Weihong Han Peng Cheng Laboratory, Cuiyun Gao Harbin Institute of Technology, Shenzhen, Jiachen Liu Harbin Institute of Technology, Shenzhen, Qing Liao Harbin Institute of Technology
16:30 10m Talk		RepoMasterEval: Evaluating Code Completion via Real-World Repositories Industry Showcase Qinyun Wu Bytedance Ltd., Chao Peng ByteDance, Pengfei Gao ByteDance, Ruida Hu Harbin Institute of Technology, Shenzhen, Haoyu Gan ByteDance, Bo Jiang Bytedance Network Technology, Jinhe Tang ByteDance, Zhiwen Deng ByteDance, Zhanming Guan ByteDance, Cuiyun Gao Harbin Institute of Technology, Shenzhen, Xia Liu ByteDance, Ping Yang Bytedance Network Technology
16:40 10m Talk		Multiple Schema-Conformant Declarative Code Generation NIER Mehant Kammakomati IBM India Research Lab, Srikanth G. Tamilselvam IBM India Research Lab
16:50 10m Talk		Tuning LLM-based Code Optimization via Meta-Prompting: An Industrial Perspective Industry Showcase Jingzhi Gong University of Leeds, Rafail Giavrimis Turing Intelligence Technology, Paul Brookes TurinTech AI, Vardan Voskanyan TurinTech AI, Fan Wu TurinTech AI, Mari Ashiga University of West London/TurinTech AI, Matthew Truscott TurinTech AI, Michail Basios Turing Intelligence Technology, Leslie Kanthan TurinTech AI, Jie Xu University of Leeds, Zheng Wang University of Leeds

Hide past events