Tencent Game Big Data Analysis Engine: Architecture, Practices, and Future Directions
This article presents the design, implementation, and operational experience of Tencent's game big‑data analysis platform, covering its background, the offline, online, and real‑time multi‑dimensional analysis engines, practical use cases, performance optimizations, and future roadmap.
Introduction – The article introduces iData, Tencent's game big‑data analysis system, which combines iDataCharts for visualization and iDataEngine for analysis to overcome limitations of traditional BI tools and databases.
1. Tencent Game Big‑Data Analysis Background
Rapid growth of Tencent games with over 110 PC titles (e.g., League of Legends, DNF) and 390 mobile games (e.g., Honor of Kings) creates a complex environment.
Daily data volume exceeds 300 TB, with more than 20 billion records, ~1 300 tables, and 430+ dimensions per business.
Each game has its own intricate data model, requiring fine‑grained operations and rapid analytics.
These challenges motivate the use of big‑data techniques for precise, efficient product operation.
2. Architecture Overview
The platform is layered from bottom to top:
Data lake: cloud storage, relational databases (MySQL, PostgreSQL), Hadoop, and office storage.
Service engine: visualization engine, multi‑dimensional & real‑time analysis engine, and AI research engine.
Capability model: functional abstractions built on the engines.
Analysis methods & decision support: user‑facing applications.
The focus of this article is the data analysis engine.
3. Big‑Data Analysis Engine Components
3.1 Offline Multi‑Dimensional Analysis – TGMars
Pre‑processing + single‑shard storage & compute binding to avoid shuffle.
Bitmap indexes accelerate hot‑spot calculations.
Materialized views (monthly/annual) reduce full‑scan time.
Deeply customized Spark‑SQL via DataSourceV2 for push‑down filtering.
3.2 Online Profile Analysis – TGFace
Handles massive user‑profile queries (e.g., 50 M new users) using columnar storage and dynamic bitmap indexes.
Workflow: TGMars extracts raw user packs → scheduler → Datanode columnar storage → SQL parser → optimizer → JIT‑compiled DAG execution.
Performance: 1 × 10⁸ records, 6 dimensions → 1.25 s for drill‑down; 10 dimensions → ~3.4 s for pivot.
3.3 Real‑Time Multi‑Dimensional Analysis – TGDruid
Real‑time logs from game servers are ingested via Kafka/Pulsar, processed by Storm/Flink ETL, and fed into Druid.
Druid runs in‑memory; only recent (≤2 days) segments are kept in memory, while older data is persisted to MySQL for reporting.
Configuration‑driven ETL enables task launch within ~5 minutes without code changes.
Optimizations include time‑based partitioning, dimension validation, automatic failure detection, and Prophet‑based real‑time forecasting.
4. Application Scenarios
Typical use cases include:
User segmentation based on activity, payment, or in‑game metrics.
Tracking and profiling churned users to understand behavior before loss.
Real‑time monitoring of key indicators (DAU, revenue, match counts) for newly launched games or events.
Targeted marketing campaigns linked with custom metrics.
5. Summary and Future Plans
The roadmap aims to further ecosystem‑ize the three engines, open‑source collaboration, scientific‑driven analysis, and predictive decision‑making, while expanding the data science lab with Jupyter‑based experiments.
Thank you for reading.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.