Databases 18 min read

Practice of Distributed Database Architecture in WeBank's Core Systems

WeBank replaced traditional IOE systems with a Tencent‑based distributed platform using TDSQL and a unitized DCN architecture, where each Data Center Node serves a fixed user base, enabling linear horizontal scaling, fault isolation, rapid provisioning, and simplified single‑instance databases while handling billions of daily transactions.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Practice of Distributed Database Architecture in WeBank's Core Systems

In the context of financial industry IT localization, WeBank abandoned the traditional IOE architecture from its inception and built a distributed foundational platform based on Tencent's financial-grade distributed database TDSQL and a DCN unitized architecture. This architecture now supports hundreds of millions of users, hundreds of banking core systems, and billions of daily financial transactions.

The speaker outlines three major trends shaping financial database development: (1) localization driven by national policy and the rise of domestic database products; (2) decentralization to handle explosive data growth from mobile and online banking; (3) open-source adoption, where traditional banks increasingly use MySQL, Redis, etc., for non‑core workloads.

WeBank’s solution centers on the DCN (Data Center Node) unitized architecture. A DCN is the smallest independent deployment unit, analogous to a provincial branch in a traditional bank, and is designed to serve a fixed user base (e.g., 5–8 million users). When a DCN reaches capacity, a new DCN is added horizontally, enabling linear scaling.

Two key components support the DCN model: GNS (Global Name Service) stores and queries each user’s DCN routing information, directing requests to the correct DCN; RMB (Reliable Message Bus) provides message exchange between isolated DCNs, essential for cross‑DCN operations such as inter‑user transfers.

The DCN approach yields several advantages: fault isolation limits the impact of hardware failures; rapid scaling allows a new DCN to be provisioned within an hour; gray‑scale releases can be tested on a dedicated DCN before full rollout; and because each DCN’s workload is bounded, the internal database can use a simple single‑instance TDSQL setup, avoiding complex distributed transactions.

Drawbacks include lower overall resource utilization due to reserved buffers in each DCN, a high demand for automated operations to manage over 100 DCNs, and the need for application‑level distributed transaction frameworks to maintain consistency across DCNs.

For data storage, WeBank uses TDSQL deployed in a 2‑site, 7‑center IDC architecture: five production centers in Shenzhen and two disaster‑recovery centers in Shanghai, with intra‑city links kept under 50 km to maintain ~2 ms latency. Within each site, TDSQL employs a one‑master‑two‑slave SET with strong synchronization; cross‑city DR uses asynchronous replication. The TDSQL proxy handles SQL parsing, read/write split, and traffic control, while Zookeeper manages metadata and the scheduler orchestrates failover. WeBank opts for the NO‑Shard mode of TDSQL to eliminate sharding complexity, as the bounded DCN size removes the need for horizontal database partitioning.

Operational tooling includes the Red Rabbit platform for comprehensive monitoring, alerting, and automated maintenance (failover, migration, scaling), and Cloud DBA for intelligent fault detection, SQL performance analysis, indexing recommendations, and automated health reporting. At scale, WeBank runs over 400 SETs, 2,000+ instances, petabytes of data, handles ~600 billion transactions, and sustains peak TPS above 100 k.

Future evolution focuses on three directions: (1) hardware localization, migrating from X86 to domestic ARM (e.g., Huawei Kunpeng) servers; (2) cloud‑native adoption, moving TDSQL onto Kubernetes + Docker containers to improve resource utilization and delivery efficiency; (3) intelligent O&M, leveraging deep‑learning‑based failure prediction and ELK‑style log analysis to proactively identify and mitigate risks before they affect service.

cloud nativeDistributed DatabaseTDSQLdatabase trendsDCN architecturefinancial ITintelligent O&MWeBank
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.