Databases 20 min read

How NetBank Scaled Its Database: From Two‑Site Three‑Center to Three‑Site Five‑Center Architecture

This article details NetBank's evolution of database deployment—from early distributed setups to a unitized, cloud‑native architecture—covering disaster‑recovery upgrades, distributed database design, multi‑tenant strategies, containerized migration, and the performance and operational impacts of moving to a three‑site five‑center model.

ITPUB
ITPUB
ITPUB
How NetBank Scaled Its Database: From Two‑Site Three‑Center to Three‑Site Five‑Center Architecture

Evolution of Database Deployment Architecture

NetBank’s database deployment has progressed through multiple iterations, balancing capacity, availability, performance, and cost. It started with a distributed architecture for rapid business response, moved to a unit‑based design, and finally adopted a cloud‑native approach.

Disaster‑Recovery Upgrade

Initially, NetBank used a “two‑site three‑center” (two data‑center locations with three disaster‑recovery centers) providing only data‑center‑level protection. The architecture was later expanded to a “three‑site five‑center” model, illustrated in Figure 3‑1‑1, employing a 3‑2‑1 deployment so that any city failure can be handled by selecting a new primary database.

Distributed Database Fundamentals

A distributed database appears as a single logical entity but is physically spread across multiple network nodes. A centralized proxy (Figure 3‑1‑2) routes read/write requests, handles read‑write separation, and enforces permission control without the application needing to know about the underlying nodes.

Key Benefits of Distributed Databases

Continuous availability – multiple replicas keep the service running when a replica fails; consistency is ensured with Paxos or Raft protocols.

Scalability – adding nodes linearly increases read/write capacity, supporting traffic spikes.

Cost efficiency – commodity servers replace expensive hardware, and automatic fault‑tolerance reduces operational expenses.

Distributed Transactions and Consistency

Transactions guarantee ACID properties. Distributed transactions use a two‑phase commit (2PC) protocol with a coordinator and participants. The coordinator first sends a prepare command, collects acknowledgments, then issues commit or abort. Latency stems from two log writes (prepare and commit) and two network round‑trips.

Multi‑Tenant Strategy

Each tenant receives isolated resources (CPU, memory, storage, bandwidth, connections). Isolation extends to security (tenant‑specific users), fault tolerance (a failure in one tenant does not affect others), and operations (resource scaling and backup are performed per tenant).

From “Two‑Site Three‑Center” to “Three‑Site Five‑Center”

The upgrade adds two extra replicas, enabling city‑level disaster recovery. It increases capacity and read‑only replica count but introduces cross‑city latency (e.g., 6‑8 ms between Hangzhou and Shanghai) and requires network and hardware expansion.

Application Latency Analysis and Optimization

Trace middleware parses application and database logs to identify hot SQL, execution order, and call frequencies. Optimization directions include caching, deployment adjustments, SQL tuning, asynchronous processing, and batch‑job redesign to mitigate increased latency and lock contention.

Data Access Routing Policy

Access priority follows: same‑rack → same‑city → same‑city under high load → cross‑city → cross‑city under high load. This policy reduces latency and balances database load across replicas.

Multi‑Cluster Deployment

NetBank evolved from a single‑node setup to multiple clusters segmented by business domain (user, product, accounting, exchange, etc.). Vertical sharding separates business lines, while horizontal sharding distributes tables across clusters, limiting the impact of a single‑cluster failure to roughly 10 % of traffic.

Containerized Deployment

Containerization increases the number of database clusters on limited physical machines, eliminating single points of failure. The migration from pure ECS to ECS + containers proceeds in three steps:

Add container nodes and move a standby replica to a container (Figure 3‑1‑15).

Replace both container nodes and promote a container to primary, setting priority for failover (Figure 3‑1‑16).

If a container fails, roll back to ECS, verify stability, then replace the ECS node with a container (Figure 3‑1‑17).

Partition vs. Containerization

When the system is small, containers create additional clusters; partitioning is unnecessary. As the system grows, partitioning expands capacity without adding clusters, while containers provide finer‑grained management. Partitioning is a native distributed‑database feature; containerization is an external deployment technique.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilitycontainerizationdisaster recoverymulti-tenantdistributed databases
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.