Integrating Rules and Semantics for LLM-Based C-to-Rust Translation (ICSME 2025 - Industry Track) - ICSME 2025 - International Conference on Software Maintenance and Evolution

Who

Feng Luo, Kexing Ji, Cuiyun Gao, Shuzheng Gao, jiafeng , Kui Liu, Xin Xia, Michael Lyu

Track

ICSME 2025 Industry Track

Time Zone

The program is currently displayed in (GMT+12:00) Auckland, Wellington.

Use conference time zone: (GMT+12:00) Auckland, WellingtonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 12 Sep 2025 14:30 - 14:45 at Case Room 3 260-055 - Session 15 - Reuse 2 Chair(s): Elliott Wen

Abstract

Automated translation of legacy C code into Rust aims to ensure memory safety while reducing the burden of manual migration. Early approaches in code translation rely on static rule-based methods, but they suffer from limited coverage due to dependence on predefined rule patterns. Recent works regard the task as a sequence-to-sequence problem by leveraging large language models (LLMs). Although these LLM- based methods are capable of reducing unsafe code blocks, the translated code often exhibits issues in following Rust rules and maintaining semantic consistency. On one hand, existing methods adopt a direct prompting strategy to translate the C code, which struggles to accommodate the syntactic rules between C and Rust. On the other hand, this strategy makes it difficult for LLMs to accurately capture the semantics of complex code. To address these challenges,we propose IRENE, an LLM-based framework that Integrates RulEs aNd sEmantics to enhance translation. IRENE consists of three modules: 1) a rule-augmented retrieval module that selects relevant translation examples based on rules generated from a static analyzer developed by us, thereby improving the handling of Rust rules; 2) a structured summarization module that produces a structured summary for guiding LLMs to enhance the semantic understanding of C code; 3) an error-driven translation module that leverages compiler diagnostics to iteratively refine translations. We evaluate IRENE on two datasets (xCodeEval—a public dataset, HW-Bench—an industrial dataset provided by Huawei) and eight LLMs, focusing on translation accuracy and safety. In the xCodeEval, IRENE consistently outperforms the strongest baseline method in all LLMs, achieving average improvements of 8.06% and 12.74% in the computational accuracy (CA) and compilation success rate (CSR), respectively. It also enhances the safety of translated code, reducing the Unsafe Rate (UR) to 1.70% on average. In the HW-Bench, when compared to the strongest baseline, IRENE improves CSR and reduces UR by an average of 0.33% and 26%, respectively.

Feng Luo

Harbin Institute of Technology (Shenzhen)

China

Kexing Ji

Harbin Institute of Technology (Shenzhen)

China

Cuiyun Gao

Harbin Institute of Technology, Shenzhen

China

Shuzheng Gao

Chinese University of Hong Kong

China

jiafeng

Harbin Institute of Technology (Shenzhen)

China

Kui Liu

Huawei

China

Xin Xia

Zhejiang University

China

Michael Lyu

The Chinese University of Hong Kong

China

Time Zone

The program is currently displayed in (GMT+12:00) Auckland, Wellington.

Use conference time zone: (GMT+12:00) Auckland, WellingtonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 12 Sep
Displayed time zone: Auckland, Wellington change

13:30 - 15:00	Session 15 - Reuse 2NIER Track / Industry Track / Research Papers Track at Case Room 3 260-055 Chair(s): Elliott Wen The University of Auckland

13:30 15m		AST-Enhanced or AST-Overloaded? The Surprising Impact of Hybrid Graph Representations on Code Clone Detection Research Papers Track Zixian Zhang School of Computer Science, University of Galway, Takfarinas Saber School of Computer Science, University of Galway
13:45 10m		Client–Library Compatibility Testing with API Interaction Snapshots NIER Track Gustave Monce Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, Thomas Degueule CNRS, Jean-Rémy Falleri Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI. Institut Universitaire de France., Romain Robbes CNRS, LaBRI, University of Bordeaux Pre-print
13:55 10m		Prompting Matters: Assessing the Effect of Prompting Techniques on LLM-Generated Class Code NIER Track Adam Yuen University of Calgary, John Pangas University of Calgary, Md Mainul Hasan Polash University of Calgary, Ahmad Abdellatif University of Calgary
14:05 10m		From First Use to Final Commit: Studying the Evolution of Multi-CI Service Adoption NIER Track Nitika Chopra Trent University, Taher A. Ghaleb Trent University Pre-print
14:15 15m		Automated Recovery of Software Product Lines from Legacy Configurable Codebases Industry Track Tewfik Ziadi University of Doha for Science and Technology (UDST), Karim Ghallab Sorbonne Université - RedFabriQ/Mobioos, Zaak Chalal RedFabriQ/Mobioos
14:30 15m		Integrating Rules and Semantics for LLM-Based C-to-Rust Translation Industry Track Feng Luo Harbin Institute of Technology (Shenzhen), Kexing Ji Harbin Institute of Technology (Shenzhen), Cuiyun Gao Harbin Institute of Technology, Shenzhen, Shuzheng Gao Chinese University of Hong Kong, jiafeng Harbin Institute of Technology (Shenzhen), Kui Liu Huawei, Xin Xia Zhejiang University, Michael Lyu The Chinese University of Hong Kong