What Drives Financial Industry Distributed Database Choices? Insights from ICBC’s Live Q&A
In a June 10 live broadcast, ICBC software development manager Wei Yadong answered 31 audience questions covering the motivations, architecture, storage, high‑availability, migration challenges, and operational practices behind the bank’s distributed database selection and design decisions.
Middleware Selection
The team evaluated several distributed access layers (NetEase DDB, Taobao TDDL, MySQL Proxy, MySQL Router) and selected DBLE because it was the only component that could be made operational with reasonable effort. DBLE offered significant optimizations over MyCat but required extensive configuration. To simplify deployment, the team performed deep custom development and built a unified configuration center similar to a Diamond configuration service.
Infrastructure and Storage
The underlying platform is an IaaS environment that provides SSD‑based persistent storage. Distribution is realized in two dimensions:
Lightweight distributed databases that can elastically scale.
Data sharding combined with a distributed data‑access layer.
Sharding Model
Data is partitioned into 128 logical shards. Each shard is typically deployed as a group of one primary and four replicas (three semi‑synchronous, one asynchronous remote replica). Four shards are co‑located for operational convenience, but the deployment can be adjusted based on load and latency requirements.
Sharding keys are defined per application according to business rules, e.g.:
Credit‑card BIN for card‑related tables.
Last digits of an ID card for personal data.
Region code for geographic partitioning.
All tables within a shard share the same logical name; there is no separate “original” table after sharding.
Distributed Transaction Handling
DBLE does not provide native distributed‑transaction support. The bank implements its own distributed‑transaction framework with three main patterns:
XA (2PC/3PC) – used for low‑concurrency accounting scenarios that require strong consistency.
TCC – originally considered but deemed heavyweight; still used in limited cases.
SAGA – preferred for high‑concurrency business flows, providing eventual consistency with compensation logic.
For transactions that must be strongly consistent, the application can split the operation into a primary transaction (committed on the primary MySQL instance) and a secondary transaction processed via a distributed message queue.
MySQL Containerization on Kubernetes
Key challenges when running MySQL in containers are IP address drift and I/O bottlenecks. The bank mitigates IP drift by extending Kubernetes with custom controllers that assign fixed IPs; the industry‑standard approach is to use a Kubernetes Operator. I/O performance is ensured by deploying SSDs at the IaaS layer.
Persistence is achieved by mounting SSD‑backed volumes provided by the IaaS. Network and storage are provisioned via FC‑SAN and SSD arrays, delivering sub‑10 ms latency for most transactional workloads.
High‑Availability Architecture
The MySQL HA topology consists of a single primary instance with four backups:
Three semi‑synchronous replicas for fast failover within the data center.
One asynchronous replica located in a remote site for disaster recovery.
Early disaster‑recovery relied on disk‑level replication, which incurred RTO > 30 minutes and high cost. The current semi‑synchronous setup, combined with transaction‑size limits (≈10 k rows per commit) and strict monitoring of master‑slave consistency, enables near‑zero RPO.
Applications connect to the MySQL group via a unique domain name. DNS resolves to the current primary’s virtual IP (VIP), allowing seamless primary promotion without client reconfiguration.
Cross‑Node Queries and Data Modeling
Cross‑node joins are discouraged; queries should target a specific shard. When analytical queries span multiple shards, the bank recommends using external processing engines such as Spark or Flink.
Complex joins involving more than three tables are a sign of sub‑optimal schema design. The recommended approach is to verify 3NF/BCNF compliance and, if necessary, denormalize data to trade storage for query performance.
Batch vs. Online Workloads
Batch jobs and online transactions are strictly isolated by policy to prevent batch‑induced connection exhaustion from impacting real‑time services.
Multi‑Center Active‑Active Deployment
Active‑active deployments do not expose WAN latency to transaction paths because each application connects to a local MySQL group. The DNS‑based VIP mechanism ensures that only the local primary is used, avoiding cross‑region round‑trips.
Migration from Oracle to MySQL
Key migration challenges include:
Mandatory sharding and data‑partitioning redesign.
Conversion of Oracle stored procedures to Java services.
Handling of Oracle SEQUENCE objects – the bank implements its own sequence service or adopts Snowflake‑style ID generators.
SQL tuning to accommodate MySQL’s execution engine.
At the time of migration (late 2016), MySQL 5.7 was the most stable release; MySQL 8.0 was still immature.
Redis as Distributed Cache
The bank replaced SSDB with Redis for distributed caching, integrating it with the transaction compensation logic to ensure data consistency during failover.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
