Databases 10 min read

Building a 16,000‑Node Cloud‑Native MPP Data Warehouse: Lessons from CCB

The article details how China Construction Bank's fintech arm designed, deployed, and operated a cloud‑native, three‑layer MPP data warehouse spanning 16,000 servers, covering architectural choices, performance gains, operational automation, and high‑availability strategies for ultra‑large scale workloads.

dbaplus Community
dbaplus Community
dbaplus Community
Building a 16,000‑Node Cloud‑Native MPP Data Warehouse: Lessons from CCB

R&D Background

China Construction Bank (CCB) faced scalability and performance limits with traditional MPP databases, including insufficient concurrency, lack of separation between storage and compute, complex upgrade and recovery processes, and non‑cloud‑native architecture.

To address these issues, CCB’s fintech division created the "Longfu MPP DB" – a next‑generation cloud‑native data warehouse that separates metadata, compute, and shared storage layers while retaining MPP performance.

Application Solution

The three‑tier architecture enables independent scaling of each layer. The management module handles resource provisioning, cluster lifecycle, and monitoring, while the user module consists of metadata, compute, and shared storage layers.

Metadata clusters use ETCD for service discovery and FDB for persistent storage, with stateless services processing compute requests. Compute clusters are stateless, allowing on‑demand creation, deletion, and linear scaling of concurrency. Shared storage relies on object storage, providing massive capacity, high concurrency, and durability for all compute nodes.

Performance Comparison

In a real‑world scenario (source‑integration application), Longfu MPP DB handled five times the data volume of the legacy MPP system (1,000 TB vs. 200 TB) while maintaining higher job completion rates across all time intervals, demonstrating superior performance under data growth.

Operational Solution

Because compute clusters are stateless, they can be rapidly provisioned or decommissioned via IaaS, enabling dynamic scaling, upgrades, and fault isolation. The system supports automatic fault‑self‑healing, dramatically reducing mean‑time‑to‑repair.

As the cluster grew 50‑fold in servers and 45‑fold in data (over 9 PB, 1 M+ daily jobs, tens of millions of SQL statements), several challenges emerged:

Stabilizing billions of daily metadata RPC requests.

Efficiently serving massive object‑storage I/O.

Maintaining operational efficiency for ultra‑large clusters.

Meeting bank‑grade high‑availability requirements.

To meet these, CCB implemented:

Metadata service sharding and distributed redesign, boosting capacity from billions to hundreds of billions of RPCs per day.

File‑merge, prefetch, and unified caching to reduce storage pressure.

Bucket‑based tablespaces for object storage, isolating I/O and preventing hotspotting.

Real‑time monitoring and analytics of jobs, SQL, storage, and server metrics to detect performance anomalies and guide dynamic resource scheduling.

Cross‑AZ deployment, continuous metadata backup, and active‑active configurations for enhanced high‑availability.

Scale and Impact

Since its 2020 launch, Longfu MPP DB has grown to 16,000 servers, processing over 9 PB of data, supporting dozens of critical banking applications, and achieving linear performance scaling despite a five‑fold increase in data volume.

Key outcomes include faster provisioning, reduced operational costs, higher concurrency, and robust fault tolerance suitable for enterprise‑level data warehousing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceCloud NativeData WarehouseDatabase ArchitectureMPP
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.