Ctrip's Cross‑Datacenter Hadoop Architecture: Design, Implementation, and Lessons Learned
This article details Ctrip's cross‑datacenter Hadoop architecture, covering the evolution of its Hadoop platform, the challenges of multi‑site bandwidth and latency, design choices between multi‑cluster and single‑cluster solutions, and the concrete HDFS, YARN, balancer, migration, monitoring, and throttling implementations that enable transparent, consistent, and efficient multi‑datacenter operations.
Ctrip introduced Hadoop in 2014 and has since expanded its clusters to hundreds of petabytes across thousands of nodes, employing HDFS federation with four namespaces and a custom namenode proxy, as well as an Erasure Code cluster for cold‑storage separation.
The compute layer consists of two offline and one online YARN clusters totaling over 150,000 cores, with jobs distributed across four data centers; 90% of the daily 300,000+ Hadoop jobs are Spark‑based.
Rapid growth forced Ctrip to build a new data center (Riy) because the existing Fuzhou site reached physical capacity, and projected scaling to ten‑thousand nodes by the end of 2024 required a multi‑datacenter deployment.
Network constraints (200 Gbps bandwidth per site, latency up to 10 ms under load, 10% packet loss) made it essential to minimize cross‑site traffic.
Native Hadoop suffers from high cross‑site I/O due to shuffle reads/writes and HDFS replica placement, which can cause significant bandwidth consumption when map/reduce tasks or replicas span different data centers.
Two architectural options were evaluated: (1) multi‑datacenter multi‑cluster, which is deployment‑simple but opaque to users, incurs higher operational cost, and struggles with data consistency; (2) multi‑datacenter single‑cluster, which requires source‑code changes but offers user transparency, simpler ops, and consistent replica management. The single‑cluster approach was chosen.
Early experiments mixed online and offline workloads by deploying a YARN cluster on Kubernetes in two data centers and routing low‑IO jobs to the online cluster, reducing offline compute pressure by about 8%.
The final single‑cluster design includes: a room‑aware HDFS where the NetworkTopology now tracks , enabling preferential local reads; per‑directory multi‑datacenter replica settings stored in Zookeeper and an extended EditLog; and a modified ReplicationMonitor that prefers same‑site sources during replication.
Balancer and Mover were extended to support multi‑instance deployment per data center, and the Erasure Code cluster was integrated for cold data without cross‑site code changes.
A Cross‑IDC Fsck tool was built to detect and correct misplaced replicas across namespaces, using standby namenode reads and RPC throttling to limit impact.
YARN was refactored with a ResourceManager proxy that maps users to their designated data center, ensuring that all tasks of an application run within a single site, and client‑side fallback caching provides resilience.
Automation tools were created to migrate accounts and data at BU level, orchestrating Hive account re‑balancing, DB and home directory moves, queue migration, and eventual resource reclamation, while respecting bandwidth windows and quota adjustments.
Bandwidth monitoring and throttling services were added: instrumentation in dfsclient and datanode reports cross‑site read/write metrics to a throttling service, which grants permits based on job priority and current capacity, reducing cross‑site block reads to 20% of the original and overall bandwidth usage to roughly 10%.
In summary, Ctrip achieved room‑aware HDFS, RM‑proxy‑driven YARN federation, automated storage/computation migration, and cross‑site traffic control, resulting in half of the storage and 40% of compute workloads moved to the new data center while keeping users unaware of the underlying complexity.
Future work includes intelligent account‑level migration decisions, finer‑grained replica policies at the partition level, and extending cross‑site capabilities to Hadoop 3 Erasure Code clusters.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.