How Tencent Scaled Its TDW to 8,800 Nodes and Mastered Cross-City Data Migration
Tencent’s senior engineer explains how the TDW (Tencent Distributed Data Warehouse) grew from a few hundred to thousands of nodes, the challenges of cross‑city migration, and the modeling, relationship‑chain, dual‑write tables, and platform strategies they built to ensure seamless, low‑impact data and task migration.
Introduction
The article describes the operation and migration experience of Tencent's large‑scale data warehouse, TDW, which is the biggest offline processing platform inside Tencent and one of the largest Hadoop clusters in China.
1. Tencent Large‑Scale TDW Cluster
TDW (Tencent Distributed Data Warehouse) is a massive data storage and computation platform. The cluster grew from about 400 nodes a few years ago to 8,800 nodes, supporting a daily scan volume of 20 PB and providing 200 PB of storage capacity. By the end of 2017 the size is expected to reach 20,000 nodes.
Single cluster: 8,800 nodes
Daily scan: 20 PB
Storage capacity: 200 PB
When the cluster reached 8,800 nodes, simple capacity expansion could no longer solve the problem because the existing data center and network architecture could not support further growth, prompting a cross‑city migration.
1.1 Overall Architecture of Tencent Big Data Platform
The TDW platform sits within a five‑layer big data architecture:
Data storage layer (HDFS, HBase, Ceph, PGXZ)
Resource scheduling layer
Compute engine layer (MapReduce, Spark, GPU)
Compute framework layer (Hermes, Hive, Pig, SparkSQL, GraphX)
Service layer providing analytics and machine‑learning capabilities
The migration covers HDFS, GAIA, MapReduce, Spark, Hive, Pig and SparkSQL.
2. Migration Model
2.1 Why Cross‑City Data Migration Is Hard
First , the operation workload is huge – hundreds of petabytes of data, hundreds of thousands of tasks, and tens of thousands of machines need to be moved. Second , business must remain invisible; the system must stay stable and data must not be lost during migration. Third , computation results must stay accurate and job runtimes must not fluctuate noticeably.
The most critical issue is network congestion caused by data crossing between two cities, which can affect all systems sharing the dedicated line.
2.2 Dual‑Cluster Solution
Two completely independent clusters are deployed in two cities. Data and computation are duplicated, so no cross‑city traffic occurs during migration.
The advantage is zero impact on business; the drawback is the need for a large amount of redundant hardware.
2.3 Single‑Cluster Solution
Only one active cluster exists; data is stored in a single location (either city A or city B). Migration moves a portion of machines and workloads gradually.
The risk is that tasks in one city may read data that has already been moved to the other city, causing heavy cross‑city traffic.
2.4 Relationship‑Chain Based Migration Model
A relationship chain captures the data‑to‑task dependencies. By analyzing these chains, the team can decide where data should reside so that computation follows the data, minimizing cross‑city traffic.
2.5 Generating Relationship Chains
The platform uses a tool called hadoopdoctor to collect task metadata every five minutes, recording data paths, task IDs, and read/write flags. The collected fine‑grained paths are normalized to table‑level paths, producing atomic data‑access relationships that are aggregated into relationship chains.
2.6 Splitting Large Relationship Chains
Large chains (over 100 k nodes) are split by identifying key nodes that act as cut points. Because finding a single optimal cut is hard, multiple key nodes are used to break the chain into manageable pieces.
2.7 Introducing Hive Dual‑Write Tables
When a key node is identified, the corresponding Hive table is turned into a dual‑write table with two locations (city A and city B). Writes go to both locations, and reads are directed to the nearest location, reducing cross‑city traffic.
2.8 Ensuring Data Consistency
A synchronization task copies data from city A to city B. Downstream tasks are made dependent on the sync task, guaranteeing that they only read data after it has been fully synchronized.
3. Cross‑City Migration Platform
3.1 Relationship‑Chain Migration Module
The module first processes dual‑write tables, then migrates other data by expanding partitions and using
distcp. Before migration, users are notified; during migration, write tasks are frozen to guarantee consistency, and data differences between cities are continuously compared until they match.
3.2 Platform Assurance Module
Two aspects are covered:
Basic assurance : data validation after migration and sampling‑replay of a vertical path in the relationship chain to verify task correctness.
Monitoring assurance : monitoring data volume changes, task health in the new city, and abnormal traffic spikes caused by unintended cross‑city data access.
4. Migration Strategies
4.1 Independent Deployment of Migration Cluster
A dedicated migration cluster runs
distcpjobs, consuming mainly network bandwidth while keeping CPU usage low. Using high‑speed network cards and a small number of machines (e.g., 40) can support up to 1 PB of migration traffic.
4.2 Traffic Control During Migration
The migration cluster’s bandwidth is limited to avoid saturating the source or target clusters. Because Hadoop writes multiple replicas, target‑cluster traffic can be twice the migration traffic, so a resource‑pool limit is applied.
4.3 Synchronization Tasks
Sync tasks run in the opposite direction (city B → city A) and have minimal impact on traffic. They are placed in a separate resource pool to finish quickly without affecting other workloads.
4.4 HDFS Cluster Scaling Strategies
During scaling‑down, the whole cluster is taken offline after cleaning data and merging small files. During migration, machines are moved in batches (e.g., 200 nodes per round). New nodes are initially excluded from computation to avoid network saturation caused by Hadoop’s balance mechanism.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.