Evaluating Terminology Translation in Machine Translation Systems via Metamorphic Testing
Machine translation has become an integral part of daily life, with terminology translation playing a crucial role in ensuring the accuracy of translation results. However, existing translation systems, such as Google Translate, have been shown to occasionally produce errors in terminology translation. Current metrics for assessing terminology translation rely on reference translations and bilingual dictionaries, limiting their effectiveness in large-scale automated MT system testing.
To address this challenge, we propose a novel method: Metamorphic Testing for Terminology Translation (TermMT), which achieves effective and efficient testing for terminology translation in MT systems without relying on reference translations or bilingual terminology dictionaries. Our approach involves constructing metamorphic relations based on the characteristics of terms: (a) adding an appropriate reference of the term in the given context would \textit{not change} the translation of the term; (b) if we modify part of a multi-word term, the translation of the revised word combination would \textit{change}. To evaluate the effectiveness of TermMT, we tested the terminology translation capabilities of three machine translation systems, Google Translate, Bing Microsoft Translator, and mBART, using the English portion of the bilingual UM-corpus dataset. The results show that TermMT detected a total of 3,765 translation errors on Google Translate, 2,351 on Bing Microsoft Translator, and 6,011 on mBART, with precisions of 82.33%, 83.00%, and 86.33%, respectively.