Artificial Intelligence 20 min read

How Bilibili’s LLM-Powered System Cuts Game Localization Costs by 80%

Bilibili’s game algorithm team built a four‑layer, LLM‑based translation platform that automates terminology extraction, retrieval‑augmented generation, and quality assessment, dramatically reducing localization cycles by over 85% and costs by up to 80% while supporting ten languages and ensuring consistent, culturally‑accurate game text.

Bilibili Tech

Oct 27, 2025

How Bilibili’s LLM-Powered System Cuts Game Localization Costs by 80%

1. Introduction

As the global game market expands, Bilibili is actively entering overseas markets with a rich portfolio of self‑developed and licensed games. Successful overseas releases require comprehensive localization, especially language adaptation, which must preserve cultural nuance and player immersion across system UI, skill descriptions, story dialogue, and event announcements.

The complexity of game translation creates three core challenges: diverse content types, high quality‑control cost, and the need to balance cost with efficiency. Frequent content updates further inflate annual maintenance costs.

To lower localization costs while maintaining quality, Bilibili’s game algorithm team built a large‑language‑model (LLM) based translation system that has shown significant results across multiple projects and languages.

2. Traditional Translation Methods and Pain Points

2.1 Traditional Translation Process

Traditional game localization relies on human translators assisted by CAT tools (e.g., MemoQ, Trados, Smartcat). The typical workflow includes:

Step 1 – Familiarize with game content: Translators play the game or review materials to understand mechanics, characters, and world‑view.

Step 2 – Create style guide: Define terminology, formatting, length limits, etc., to ensure consistent style.

Step 3 – CAT‑assisted draft translation: Use translation memories and term bases to keep terminology and phrasing consistent.

Step 4 – Draft review and revision: Multiple rounds of editing to fix style and accuracy issues.

Step 5 – Professional LQA quality check: Localization Quality Assurance experts perform final inspection before release.

Key pain points of the traditional approach:

High cost: Professional game translators are scarce and expensive; large‑scale, multi‑language projects can reach millions of yuan annually.

Long cycle: Multi‑round reviews often take over two months per version.

Unstable quality: Dependence on external vendors leads to inconsistent terminology and occasional delays.

Under‑utilized historical assets: Memory‑based fuzzy matching struggles with variable‑rich game text, and term management remains manual.

3. Bilibili Game LLM Translation System

3.1 Overall Architecture

The system adopts a four‑layer architecture:

Data layer: Preparation of historical translations, term tables, style guides, and source texts.

Algorithm layer: Core translation capability consisting of three modules: automatic term mining, Retrieval‑Augmented Generation (RAG + LLM), and a three‑component quality‑assessment model.

Evaluation layer: Quality assurance split into production‑stage checks (rule‑based + LQA) and testing‑stage metrics (BLEU, TQE scores).

Application layer: Supports various text types such as UI strings, announcements, skill descriptions, SNS messages, and story dialogue.

3.2 Core Workflow

Step 1 – Data preparation: System automatically extracts relevant terms and translation memories from the repository.

Step 2 – Intelligent translation: Combines mined terms with RAG context to generate high‑quality translations.

Step 3 – Quality assurance: Multi‑level checks ensure output quality; any failure triggers feedback‑driven optimization.

3.3 System Value

Efficiency boost: Translation cycle shortened by more than 85%, overall cost reduced by 70‑80%.

Scale: Simultaneous translation of 10 languages (Simplified Chinese, Traditional Chinese, Japanese, Korean, Thai, English, German, French, Spanish, Portuguese) enables global synchronized releases.

Stable quality: Standardized process reduces reliance on external vendors; AI + human hybrid model keeps online complaints below 0.01%.

4. Core Technologies of the Translation System

4.1 Retrieval‑Augmented Generation (RAG)

RAG addresses term inconsistency, style drift, and narrative breaks by providing LLMs with relevant context from term and memory databases.

4.1.1 Core Benefits

Terminology consistency: Guarantees uniform translation of IP‑specific names, skills, and character titles.

Contextual coherence: Maintains narrative flow across dialogue by retrieving similar past translations.

4.1.2 RAG Architecture

The RAG system consists of:

Term retrieval module (Hybrid Search): Queries a general term base, prioritizes scene‑specific terms, and ensures exact matches for IP‑critical vocabulary.

Memory retrieval module: Splits source text into semantic chunks, performs top‑k retrieval, and re‑ranks results using a weighted score:

Score = α·SemanticSim + β·RoleSim + γ·StyleSim + δ·MoodSim

4.2 Automatic Term Mining

Two complementary processes automatically expand term libraries:

Historical term extraction: Identifies untranslated term pairs from legacy data using few‑shot prompting, lexical analysis, and expert review.

Candidate term discovery in new texts: Analyzes source‑only content to propose target‑language term candidates, followed by expert validation.

4.3 Automated Translation Quality Assessment

Quality issues are categorized into three dimensions:

Accuracy: Over‑translation, omission, and term violations.

Language quality: Language mixing, formatting errors, and readability problems.

Localization suitability: Cultural mismatches, incorrect numeric/date formats, and inappropriate register.

Evaluation combines rule‑based checks, BLEU/COMET pre‑screening, and human TQE feedback. A three‑layer governance strategy includes post‑check mechanisms, multi‑round expert evaluation, and an LLM‑as‑Judge agent trained on TQE data using chain‑of‑thought reasoning.

‹quality_analysis></code><code>1.准确性问题：存在错译，“暗影法师”误译为“Light Mage”。</code><code>2.语言质量问题：符号使用规范，无可读性错误。</code><code>3.语种特有问题：表达符合目标语言表达习惯。</code><code></quality_analysis></code><code>评分：[错误=1, 正确=0, 正确=0] → 综合评分：需要修正

5. Benefits of the Translation System

5.1 Business Impact

Translation cost savings of 70‑80%.

Translation efficiency increased by more than 7×.

Online complaint rate kept below 0.01%.

5.2 Technical Metrics

Supports 10 languages and processes over 100,000 characters per version.

Post‑LLM LQA modification rate significantly lower than traditional vendor pipelines.

6. Conclusion and Outlook

The systematic technical innovations and engineering practices have built an effective game translation system that resolves traditional pain points in cost, efficiency, quality, and stability, providing strong support for Bilibili’s global game releases.

Future directions include multimodal integration, contrastive learning, end‑to‑end frameworks for translation, quality inspection, and term discovery, as well as AI‑as‑Refiner for automated text polishing and style consistency.

7. Acknowledgements

Special thanks to the infrastructure, operations, and LQA teams for providing stable LLM resources, API support, and critical business feedback that enabled the system’s successful deployment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM quality assessment RAG game localization translation automation

Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.