How Tencent Built Its Massive Big Data Platform Over a Decade
Over more than ten years, Tencent evolved its big data infrastructure through three generations—from early Hadoop-based offline processing, to a hybrid real‑time Spark/Storm system, and finally to a self‑developed, open‑source machine‑learning platform—highlighting the shift from “borrowed” solutions to fully proprietary, AI‑ready architectures.
Introduction: Tencent, as one of China's largest internet companies, handles massive daily data volumes. Efficient storage, management, and utilization of this data are essential to avoid turning data assets into waste.
The big data platform is a core infrastructure that processes tens of millions of offline tasks and tens of trillions of real‑time computations each day to meet billions of analytical queries.
01 Tencent Big Data Construction Philosophy
When the project was initiated, the team debated whether to build everything in‑house or adopt open‑source solutions. Urgent business needs in 2009 (e.g., QQ Space's “Happy Farm”) demanded a rapid data‑warehouse build, leading the team to choose open source for its rich community resources and faster deployment.
Although the team originally used C/C++ for billing systems, the big‑data ecosystem primarily relied on Java, requiring a learning curve and recruitment of experienced developers. The ecosystem’s breadth also meant that each component needed extensive optimization to meet enterprise‑grade reliability.
02 Tencent Big Data Overall Architecture
Over more than a decade, Tencent’s big‑data platform has undergone three major architectural generations.
First generation (2009‑2011): Offline‑focused, built on Hadoop (TDW). Optimizations included expanding cluster scale, improving scheduling, enhancing disaster recovery, and integrating surrounding ecosystems such as an Oracle‑compatible SQL layer and PostgreSQL for small‑data analytics.
Second generation (2012‑2014): Added real‑time capabilities by integrating Spark for faster batch processing and Storm for millisecond‑level streaming. Built a real‑time data collection system (TDBank) that reduced ingestion latency from daily to seconds, and introduced resource and task scheduling that supported CPU, memory, network, and I/O dimensions.
Third generation (2015‑2019): Focused on AI and machine‑learning workloads. Developed the high‑performance distributed machine‑learning platform Angel in collaboration with Peking University, supporting billions‑scale models, data and model parallelism, and online training. Angel was open‑sourced in 2017 and later donated to the Linux Foundation.
The platform also incorporated heterogeneous resource management (GPU, FPGA) and tightly integrated a PostgreSQL‑based distributed database (TBase) to enable HTAP capabilities, bridging big data and transactional processing.
Today, Tencent continues to explore next‑generation directions such as batch‑stream fusion, cloud‑native big data, AI‑cloud convergence, data lakes, and privacy‑preserving computation, underscoring the foundational role of big data, AI, and cloud computing in its business ecosystem.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
