Artificial Intelligence 12 min read

Comparative Evaluation of Deepl and ChatGPT Machine Translation for Game Localization

This article investigates the translation quality of Deepl and ChatGPT for the game 'Naraka: Bladepoint' by comparing their outputs against professional human translations across Chinese‑English, Chinese‑Spanish, and English‑Spanish pairs using BLEU scores and manual assessment, revealing strengths and limitations of each system.

NetEase LeiHuo Testing Center

Mar 31, 2023

Comparative Evaluation of Deepl and ChatGPT Machine Translation for Game Localization

Recent advances in artificial intelligence have expanded the scope of AIGC (AI‑generated content), with tools like ChatGPT demonstrating capabilities in natural language processing, code generation, and content creation. In the gaming industry, AIGC can assist with AI‑generated art, voice‑overs, copywriting, and even programming.

This study evaluates whether ChatGPT’s translation function can be applied to real‑world game localization by selecting four representative text groups from the Chinese game Naraka: Bladepoint (skill descriptions, story background, action descriptions, and literary style). Human translations serve as the reference standard, and the machine translation outputs of Deepl and ChatGPT (version 3.5) are compared using BLEU scores and manual evaluation.

Test preparation

ChatGPT version: 3.5

Machine translation tool: Deepl (chosen for its generally higher accuracy on technical and academic texts)

Test languages: Chinese ↔ English, Chinese ↔ Spanish, English ↔ Spanish

Evaluation metric: BLEU score (the most widely used automatic metric for MT quality)

Test method

The four text groups were translated in three directions (Chinese→English, Chinese→Spanish, English→Spanish) by both Deepl and ChatGPT. BLEU scores were calculated for each output against the human reference, and a manual review examined grammar, terminology, idioms, cultural references, and literary quality.

Results and analysis

Overall, both systems achieved only one BLEU score above 40, indicating that current MT quality is still far from professional standards.

Deepl outperformed ChatGPT in 7 out of 12 BLEU evaluations, showing higher similarity to human translations.

English→Spanish translations scored higher than Chinese→Spanish for both tools, likely due to larger English‑Spanish corpora and closer linguistic families.

Grammar was generally acceptable for both systems, but subjective judgments (e.g., correct subject selection in skill descriptions) favored human translators.

Terminology, idioms, cultural references, and mythological allusions were often mistranslated or overly literal, with examples such as "单双排" rendered as "single and double rows" (Deepl) and "single and double formations" (ChatGPT) instead of the correct "Solo and Duos".

Literary passages lost poetic nuance; Deepl and ChatGPT produced straightforward renderings lacking the original’s aesthetic depth.

The analysis confirms that while machine translation can handle basic grammatical structures, it struggles with domain‑specific terminology, cultural nuances, and literary style. Consequently, human post‑editing remains essential for high‑quality game localization.

Conclusion

At the current stage, Deepl and ChatGPT demonstrate solid grammatical performance but fall short in handling game‑specific terms, idioms, cultural background, and literary expression. The practical workflow should still prioritize human translators with machine translation serving as an auxiliary tool. As models continue to evolve, ChatGPT’s potential may increase, but reliable, nuanced localization will likely remain a collaborative effort between humans and AI.

References

https://www.letsmt.eu/Bleu.aspx

https://cloud.tencent.com/developer/article/1159767

https://arxiv.org/pdf/2301.08745.pdf

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ChatGPT AIGC Machine Translation Localization BLEU deepl game industry

Written by

NetEase LeiHuo Testing Center

LeiHuo Testing Center provides high-quality, efficient QA services, striving to become a leading testing team in China.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.