Databases 26 min read

Bank Core System Transformation and GaiaDB-X Distributed Database Solutions for Financial Scenarios

To meet exploding transaction volumes, rapid innovation cycles, and strict regulatory demands, large banks are replacing mainframe core systems with distributed, horizontally‑scalable architectures, and Baidu’s GaiaDB‑X database—offering strong ACID consistency, zero‑RPO disaster recovery, and automated operations—has successfully powered core banking migrations for institutions such as Bank of China and state‑owned banks.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Bank Core System Transformation and GaiaDB-X Distributed Database Solutions for Financial Scenarios

In the IT construction history of banks, especially large and medium-sized banks, core systems were mostly built on mainframes and minicomputers. With the rapid development of banking business, such systems face increasing challenges in supporting business operations, mainly reflected in four aspects:

Difficulty in supporting rapid business development: With the rapid development of domestic e-commerce, internet payments, and mobile payments, bank transaction volumes have grown exponentially. For example, a bank client currently has a peak transaction volume of over 10,000 per second, expected to grow to 60,000 per second in the next few years.

Difficulty matching system iteration speed: The original fat-core architecture based on mainframes has iteration cycles of several months or even half a year. Banks urgently need to launch new businesses quickly for innovation, desiring weekly-level rapid iteration like internet companies.

System risks: Banks urgently need to achieve software and hardware independent controllability.

Closed ecosystem: Mainframe technology develops slowly and talent is difficult to recruit.

Under the guidance of national policies, major banks have been migrating their core architectures from mainframes to general-purpose server architectures, building a new self-controllable technology system, referred to as core system migration.

A state-owned bank's business scale includes 500-700 million customers, 1-2 billion accounts, 20,000-40,000 branches nationwide, with peak transaction volume of 50,000-80,000 TPS. At the database layer, the largest tables have hundreds of billions of records with TPS reaching millions. Unified query business requires supporting nearly ten years of transaction details, meaning trillion-level query records.

Architecture Overview:

From IaaS layer, banks use X86 and ARM architecture general-purpose servers, hybrid cloud technology, and extensively use virtual machines and container services. In PaaS layer, banks use large-scale distributed systems including open-source microservice frameworks (such as SpringCloud), open-source or commercial databases including distributed/single/relational/cache databases, as well as log databases like ES and time-series databases. In middleware, they use many open-source or modified components like message queues, object storage, and distributed locks.

In SaaS layer, banks mainly achieve distributed expansion through unitization + microservice architecture. Banks divide their business applications into three types of units:

Top-level global units: Mainly serve global routing and traffic distribution.

Business units: Core business logic is implemented here. Banks, like internet companies, perform unitization splitting for business to achieve horizontal expansion and support hundreds of millions of customers. For example, one bank splits its business into 16 business units, each serving 50 million customers, deployed in two machine rooms forming city-level dual-active architecture.

Bottom-level public units: Applications that are difficult or unnecessary to unitize are placed in public units to provide public services.

Database Architecture:

Two main database architectures are used in bank new core system migration:

1. Single-machine database architecture: Simple with small failure domain, but business systems are more complex. Some modules like global routing cannot be unitized, so a single set of databases cannot meet performance and capacity requirements, requiring data splitting at the business layer.

2. Distributed database architecture: Although more complex internally, it provides better performance. For the business layer, one unit can use one set of distributed database, making business logic simpler.

Database Requirements for Core System Migration:

Distributed scalability: Using general-purpose servers with much weaker single-machine performance than mainframes requires databases to have distributed scalable capabilities.

Strong consistency: Financial scenarios have extremely high requirements for data correctness and consistency, requiring strict guarantee of ACID transaction characteristics.

Disaster recovery capabilities: General-purpose servers have higher hardware failure rates than mainframes. Regulatory requirements demand Level 5+ disaster recovery capabilities with city-level dual-active and RPO=0.

Operations capabilities: After system migration to general-purpose servers and de-IOE, database node count increases 50 times. Operations efficiency must be qualitatively improved through intelligent automation.

GaiaDB-X Architecture:

GaiaDB-X is Baidu Intelligent Cloud's Shared Nothing distributed database, developed based on general-purpose servers for horizontal scaling to meet high-performance and large-capacity requirements. It has a three-layer architecture:

Compute layer: Stateless and horizontally scalable, compatible with MySQL protocol. After receiving SQL requests, it performs SQL parsing, permission checking, logical and physical optimization, then generates DistSQL to be distributed to storage layer shards.

Storage layer: Uses multi-shard expansion, data distributed across shards according to partitioning rules (Hash, Range, List). Data within shards uses multi-replicas for reliability.

GMS nodes: Global metadata management module managing global data like table schema, permissions, routing information, and global logical sequence numbers for distributed transactions. GMS also uses multi-replicas with Raft protocol for data synchronization.

Development History:

GaiaDB-X has 18 years of development history, closely related to Baidu's business growth:

Phase 1 (2005): To meet high read requests in search and community businesses, used one-master-multi-slave clustering with read-write separation.

Phase 2: To support Fengchao advertising system and Baidu Cloud with trillion-level data volume, began distributed system development. By 2014, replaced Oracle-based storage in Fengchao, saving tens of millions annually.

Phase 3: With the rise of Baidu Wallet and internet businesses, higher data consistency requirements led to implementing strong distributed transaction features.

Phase 4 (Current): With Baidu Intelligent Cloud's external technology output, database has been deployed in over 10 industries covering 150 customers. In finance, GaiaDB-X has undertaken core financial businesses including Bank of China, China UnionPay, a certain exchange, and state-owned banks.

Horizontal Scaling:

First is GMS (global metadata module). To avoid becoming a bottleneck, batch pre-allocation is used to improve throughput, achieving 12 million TSO numbers per second, far exceeding million-level TPS requirements. Second is global transaction table, distributed across the cluster for distributed expansion.

In practical applications, during 2019 Spring Festival red packet event, it supported 300 million users with peak 120,000 TPS. For a bank with 80 million accounts, it smoothly supported 60,000 TPS.

Consistency Solutions:

For financial scenarios with strict ACID requirements, three main approaches exist:

TrueTime: Used by Google Spanner, but requires GPS and atomic clocks - generally difficult to have.

HLC: Used by CockroachDB, no hardware dependency but weaker consistency and requires server clock error within 250ms.

TSO: Used by TiDB. A globally unique auto-incrementing logical sequence number. A transaction obtains sequence numbers twice from GMS at start and commit, then uses TSO as version number for storage layer. Does not require strong hardware consistency but depends on global center clock distributor GMS. This approach was ultimately adopted.

Disaster Recovery:

According to PBOC regulatory requirements, bank core systems generally need Level 5+ disaster recovery capabilities, requiring two-city-three-center capabilities. Typical deployment: two city-level machine rooms in Beijing (50-100km apart, 1ms network latency) plus one remote machine room in Hefei (1000km away, 10ms latency). Database uses 3+2 deployment across machine rooms with strong synchronization to ensure RPO=0 after machine room-level failures.

Case Studies:

1. Bank of China (BaiXin Bank): Completely Oracle-free, 200+ business systems all built on GaiaDB-X, running stably for five years. Database localization rate reaches 99.93%. Completed city-level dual-active construction in 2019, migrated to Kunpeng-based domestic cloud in 2022. Hardware cost per account reduced by over 70%.

2. A PBOC-affiliated exchange: Joint development with Baidu to build database capabilities. Gradually implemented database localization from peripheral systems to core trading systems. Collocate mechanism reduced trading latency from 80ms to 15ms.

3. A state-owned bank: Migrated core system from minicomputers to general-purpose servers, replaced Oracle with open-source single-machine database. Database nodes increased 50 times to about 1,000 servers. With Baidu's unified database management platform, achieved full new core system deployment, passing PBOC acceptance.

Distributed Databasedisaster recoveryunitizationbank core systemfinancial architectureGaiaDB-XTSO consistency
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.