Syntax and Domain Aware Model for Unsupervised Program Translation
Software could be originally developed in one language and then migrated to another for a different platform. There is an increasing need for translating source code from one programming language to another. Manually migrating projects between languages is time-consuming and error-prone. In recent years, researchers began to explore automatic program translation using supervised deep learning techniques by learning from large-scale parallel code corpus. However, parallel resources are scarce in the programming language domain, and it is costly to collect bilingual data manually. To address this issue, several unsupervised programming translation systems are proposed. However, these systems still rely on huge monolingual source code to train, which is very expensive. Besides, these models cannot perform well for translating the languages that are not seen during the pre-training procedure. To this end, we propose SDA-Trans, a syntax and domain aware model for program translation, which leverages the syntax structure and domain knowledge to enhance the cross-lingual transfer ability. SDA-Trans is trained in an unsupervised manner using a smaller-scale corpus including Python and Java monolingual programs. The experimental results on function translation tasks between Python, Java, and C++ show that SDA-Trans outperforms many large-scale pre-trained models, especially for unseen language translation.