Translation of Low-Resource COBOL to Logically Correct and Readable Java leveraging High-Resource Java Refinement
Automated translation of legacy code to modern programming languages is the need of the hour for modernizing enterprise systems. This work specifically addresses automated COBOL to Java translation. Traditional rule-based tools for this perform statement-wise translation, overlooking possible modularization and refactoring of the source COBOL code to translate to human-readable target Java code. Our investigation reveals that state-of-the-art Large Language Models (LLMs) in the domain of code encounter difficulties with regard to logical correctness and readability when directly translating low-resource COBOL code to Java. To address these challenges, we propose an LLM-based workflow, leveraging temperature sampling and refinement-based strategies, to not only ensure logical correctness of the translation but also maximize the readability of the target Java code. We exploit the fact that, due to their extensive exposure to human-written Java codes during pre-training, the LLMs are more equipped with profound comprehension and capability for refining translated Java codes than COBOL to Java translation. With a dataset sourced from CodeNet, we demonstrate that sequential refinement of the translated high-resource Java code with execution-guided logic feedback followed by LLM-based readability feedback, yields better performance in terms of logical correctness (81.99% execution accuracy) and readability (0.610 score), than LLM based translation with test cases and readability guidance (60.25% and 0.539) or refinement of the translation task itself (77.95% and 0.572).