Enhancing Code Generation through Retrieval of Cross-Lingual Semantic Graphs (APSEC 2024 - Technical Track)

Who

Zhijie Jiang, Zejian Shi, Xinyu Gao, Yun Xiong

Track

APSEC 2024 Technical Track

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 6 Dec 2024 09:30 - 10:00 at Room 1 (Zunhui Room) - Session (16) Chair(s): Haoye Tian

Abstract

In the field of software engineering automation, code language models have made significant strides in code generation tasks. However, due to the cost of updating knowledge and the issue of hallucinations, code language models (CLMs) face challenges in practical code generation scenarios, making retrieval-augmented code generation a mainstream approach. Existing retrieval-augmented methods only build codebases for a single programming language, which is insufficient to address the lack of monolingual knowledge. To address this, we propose CodeRCSG, a novel cross-lingual retrieval-augmented code generation method. This method constructs a multilingual codebase and creates a unified cross-lingual code semantic graph to capture deep semantic information across different programming languages. By encoding the retrieved code semantic graph with GNN and combining it with input text embeddings, code language models can effectively utilize the transferred cross-lingual programming knowledge to improve the quality of generated code. Experimental results show that CodeRCSG can significantly enhance the code generation capabilities of code language models.

Zhijie Jiang

National University of Defense Technology

China

Zejian Shi

Fudan University

China

Xinyu Gao

Yun Xiong

Fudan University

China

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 6 Dec
Displayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change

09:30 - 10:30	Session (16)Technical Track at Room 1 (Zunhui Room) Chair(s): Haoye Tian University of Melbourne

09:30 30m Talk		Enhancing Code Generation through Retrieval of Cross-Lingual Semantic Graphs Technical Track Zhijie Jiang National University of Defense Technology, Zejian Shi Fudan University, Xinyu Gao , Yun Xiong Fudan University
10:00 30m Talk		Optimizing LLMs for Code Generation: Which Hyperparameter Settings Yield the Best Results? Technical Track Chetan Arora Monash University, Ahnaf Ibn Sayeed Monash University, Sherlock A. Licorish University of Otago, Fanyu Wang Monash University, Christoph Treude Singapore Management University