Enhancing Code Generation through Retrieval of Cross-Lingual Semantic Graphs
In the field of software engineering automation, code language models have made significant strides in code generation tasks. However, due to the cost of updating knowledge and the issue of hallucinations, code language models (CLMs) face challenges in practical code generation scenarios, making retrieval-augmented code generation a mainstream approach. Existing retrieval-augmented methods only build codebases for a single programming language, which is insufficient to address the lack of monolingual knowledge. To address this, we propose CodeRCSG, a novel cross-lingual retrieval-augmented code generation method. This method constructs a multilingual codebase and creates a unified cross-lingual code semantic graph to capture deep semantic information across different programming languages. By encoding the retrieved code semantic graph with GNN and combining it with input text embeddings, code language models can effectively utilize the transferred cross-lingual programming knowledge to improve the quality of generated code. Experimental results show that CodeRCSG can significantly enhance the code generation capabilities of code language models.
Fri 6 DecDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
09:30 - 10:30 | |||
09:30 30mTalk | Enhancing Code Generation through Retrieval of Cross-Lingual Semantic Graphs Technical Track Zhijie Jiang National University of Defense Technology, Zejian Shi Fudan University, Xinyu Gao , Yun Xiong Fudan University | ||
10:00 30mTalk | Optimizing LLMs for Code Generation: Which Hyperparameter Settings Yield the Best Results? Technical Track Chetan Arora Monash University, Ahnaf Ibn Sayeed Monash University, Sherlock A. Licorish University of Otago, Fanyu Wang Monash University, Christoph Treude Singapore Management University |