Tencent's Big Data Construction: Philosophy, Architecture Evolution, and Open‑Source Strategy
The article introduces Tencent's big‑data platform philosophy and overall architecture, detailing three generations of evolution from offline Hadoop‑based processing to real‑time Spark/Storm integration and finally AI‑driven machine‑learning platforms, while also highlighting the team, book publication, and a related giveaway event.
Introduction – Tencent, as one of China’s largest internet companies, handles massive business data and needs professional data storage, management, and usage to avoid turning data assets into liabilities.
Core Team & Authors – The Tencent Data Platform team provides reliable big‑data and machine‑learning services, focusing on cloud‑native, AI, graph computing, and more. Core author Jiang Jie, PhD from Peking University, is Vice President of Tencent and a member of several AI and big‑data committees.
Book Information – The content is excerpted from the book "Tencent Big Data Construction" (authors: Jiang Jie, Liu Yuhong, Chen Peng, Zheng Lixiong, etc.).
Construction Philosophy – When the project started in 2009, the team debated between building from scratch or using open source. Rapid business growth demanded a fast‑to‑market data warehouse, leading to the adoption of open‑source solutions for speed and community resources.
Three‑Generation Architecture
1. First Generation (2009‑2011) – Built on Hadoop (TDW), focusing on offline batch processing. Optimizations included expanding cluster scale, improving scheduling, enhancing fault tolerance, and integrating Oracle‑compatible syntax and PostgreSQL for small‑data analytics.
2. Second Generation (2012‑2014) – Added real‑time capabilities by integrating Spark for faster batch jobs and Storm for millisecond‑level streaming. Developed a real‑time ingestion system (TDBank) and introduced resource‑aware scheduling, supporting CPU, memory, network, and I/O management.
3. Third Generation (2015‑2019) – Shifted toward AI and machine‑learning workloads. Co‑developed the high‑performance distributed ML platform Angel with Peking University, supporting billions of model dimensions, data and model parallelism, and GPU/FPGA resources. Integrated a PostgreSQL‑based distributed database (TBase) for HTAP capabilities.
Open‑Source Contributions – Tencent open‑sourced its Hive variant, Angel (graduated from LF AI), TubeMQ, TKEStack, TBase, and Kona JDK, contributing dozens of projects and thousands of commits to the community.
Future Directions – The team is exploring next‑generation platforms: batch‑stream fusion, cloud‑native big data, AI‑big‑data‑cloud integration, data lakes, and privacy‑preserving computing.
Book Giveaway Event – A promotional activity offers four copies of the book to followers of the Aikexing Open‑Source Community. The event runs until September 2 2022 16:30, with rules requiring users to follow the community’s public account and optionally invite friends for higher winning chances. Winners must provide shipping information within three days of the automated draw.
Disclaimer – The activity is organized by the Aikexing Open‑Source Community, which holds final interpretation rights.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
