RustRepoTrans: Repository-level Context Code Translation Benchmark Targeting Rust
Recent advancements in large language models (LLMs) have demonstrated impressive capabilities in code translation, typically evaluated using benchmarks like CodeTransOcean and RepoTransBench. However, dependency-free benchmarks fail to capture real-world complexities by focusing primarily on simple function-level translations and overlooking repository-level context (e.g., dependencies). Full-repository translation benchmark significantly exceed the current capabilities of existing models, resulting in performance bottlenecks that fail to provide actionable insights for guiding model development. Moreover, LLMs’ effectiveness in translating to newer, low-resource languages like Rust remains largely underexplored.
To address these gaps, we introduce RustRepoTrans, the first repo-sitory-level context code translation benchmark, comprising 375 tasks translating into Rust from C, Java, and Python. Using this benchmark, we evaluate four representative LLMs, analyzing their errors to assess limitations in complex translation scenarios. Among them, Claude-3-5 performs best with 43.5% Pass@1, excelling in both basic functionality and additional translation abilities, such as noise robustness and syntactical difference identification. However, even Claude-3-5 experiences a 30.8% performance drop (\textit{Pass@1} from 74.3% to 43.5%) when handling repository-level context compared to previous benchmarks without such context. Meanwhile, we propose a set of more fine-grained evaluation metrics and an enhanced evaluation framework, enabling a more comprehensive analysis of LLM performance in repository-level context code translation tasks to provide fine-grained insights that can effectively inform the development of code translation techniques.