Challenges in C++ to Rust Translation with Large Language Models: A Preliminary Empirical Study
This program is tentative and subject to change.
C++ programming language is one of the mainstream choices for developing various systems due to its efficiency and widespread application, particularly in fields with high-performance requirements. However, C++ programs may have many memory management and security issues, such as dangling pointers and memory leaks, which pose increasing challenges in modern software development. As a modern programming language designed to address memory safety issues, Rust has gained widespread attention for its ownership system and memory safety features, driving research and practice in migrating C++ code to Rust. However, the differences in syntax and features between C++ and Rust, as well as C++’s complex and object-oriented features, make it extremely difficult to directly convert C++ code into Rust code.
With the development of large language models (LLMs), significant progress has been made in code translation and understanding. This paper aims to investigate the use of large language models to convert C++ code into Rust code by decomposing the C++ code into independent compilation units (CPP features) and extracting the dependent symbols through program analysis. We selected GPT and Deepseek for experimentation, analyzed their translation results, and investigated the root causes made by Deepseek. By manually classifying errors, we identified the root causes of translation issues and provided findings and suggestions for future research on translating C++ code into Rust code.