RAG or Fine-tuning? A Comparative Study on LCMs-based Code Completion in Industry
Code completion, a crucial practice in industrial settings, helps developers improve programming efficiency by automatically suggesting code snippets during development. With the emergence of Large Code Models (LCMs), this field has witnessed significant advancements. Due to the natural differences between open-source and industrial codebases, such as coding patterns and unique internal dependencies, it is a common practice to conduct domain adaptation when adopting LCMs in industry. Although there exist studies on proposing adaptation approaches, among which RAG and fine-tuning are the two most popular paradigms, no prior research has explored the trade-off of the two approaches for industrial scenarios.
To mitigate the gap, we comprehensively compare the two paradigms including Retrieval-Augmented Generation (RAG) and Fine-tuning (FT), for industrial code completion in this paper. In collaboration with Tencent’s WXG department, we collect over 160,000 internal C++ files as our codebase. We then compare the two types of alignment approaches from three dimensions that are concerned by industrial practitioners, including effectiveness, efficiency, and parameter sensitivity, using six LCMs. Our findings reveal that RAG, when implemented with appropriate embedding models that map code snippets into dense vector representations, can achieve higher accuracy than fine-tuning alone. Specifically, BM25 presents superior retrieval effectiveness and efficiency among studied RAG methods. Moreover, RAG and fine-tuning are orthogonal and their combination leads to superior performance. We also observe that RAG demonstrates better scalability than FT, showing more sustained performance gains with larger scales of codebase. Our findings provide actionable guidance for choosing and implementing appropriate methods to adopt LCMs based on specific industrial scenarios and requirements.
Mon 23 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
16:00 - 17:50 | Code Generation 1Industry Papers / Demonstrations / Research Papers / Journal First at Cosmos 3C Chair(s): Zhongxin Liu Zhejiang University | ||
16:00 20mTalk | How Do Programming Students Use Generative AI? Research Papers DOI Pre-print | ||
16:20 20mTalk | Towards Mitigating API Hallucination in Code Generated by LLMs with Hierarchical Dependency Aware Industry Papers Yujia Chen Harbin Institute of Technology, Shenzhen, Mingyu Chen Harbin Institute of Technology, Shenzhen, Cuiyun Gao Harbin Institute of Technology, Shenzhen, Zhihan Jiang Huawei Cloud Computing Technologies Co., Ltd., Zhongqi Li Huawei Cloud Computing Technologies Co., Ltd., Yuchi Ma Huawei Cloud Computing Technologies | ||
16:40 10mTalk | CoSEFA: An LLM-Based Programming Assistant for Secure Code Generation via Supervised Co-Decoding Demonstrations Xuan He Chongqing University, Dong Li Chongqing University, Hao Wen CloudWalk Technology Co., Ltd, Yueheng Zhu Chongqing University, Chao Liu Chongqing University, Meng Yan Chongqing University, Hongyu Zhang Chongqing University | ||
16:50 20mTalk | DeclarUI: Bridging Design and Development with Automated Declarative UI Code Generation Research Papers Ting Zhou Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Xinyi Hou Huazhong University of Science and Technology, Xiaoyu Sun Australian National University, Australia, Kai Chen Huazhong University of Science and Technology, Haoyu Wang Huazhong University of Science and Technology DOI | ||
17:10 20mTalk | RAG or Fine-tuning? A Comparative Study on LCMs-based Code Completion in Industry Industry Papers Chaozheng Wang The Chinese University of Hong Kong, Zezhou Yang Tencent Inc., Shuzheng Gao Chinese University of Hong Kong, Cuiyun Gao Harbin Institute of Technology, Shenzhen, Ting Peng Tencent Inc., Hailiang Huang Tencent Inc., Yuetang Deng Tencent, Michael Lyu Chinese University of Hong Kong | ||
17:30 20mTalk | Automated Code Editing with Search-Generate-Modify Journal First Changshu Liu Columbia University, Pelin Cetin Columbia University, Yogesh Patodia Columbia University, Baishakhi Ray Columbia University, Saikat Chakraborty Microsoft Research, Yangruibo Ding Columbia University Pre-print Media Attached File Attached |
Cosmos 3C is the third room in the Cosmos 3 wing.
When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.