How to Deploy a Two‑Location Three‑Center Disaster‑Recovery Architecture for High Availability
This guide explains the two‑location three‑center disaster‑recovery pattern, describing its purpose, typical deployment across two cities and three data centers, and step‑by‑step recommendations for same‑city dual‑active or primary‑backup setups, remote backup strategies, traffic routing, and essential monitoring.
Two‑Location Three‑Center Architecture
A disaster‑recovery pattern that distributes three data‑centers across two geographic locations: typically two centers in the same city (City A) and one remote center (City B). It is widely adopted in industries with stringent availability requirements such as finance and telecommunications.
Overall Design
Global traffic scheduling (GSLB/DNS) to direct user requests.
Load balancing (SLB) that serves as the disaster‑backup entry point.
Two same‑city data centers (City A) operating in either dual‑active or primary‑backup mode.
A remote data center (City B) dedicated to disaster backup (cold, warm, or hot).
Within each data center: application clusters, middleware clusters (e.g., MQ, Redis), and primary databases.
Same‑City Layer
Deploy either a dual‑active or a primary‑backup configuration:
Dual‑active enables parallel processing and load balancing, reducing failover latency.
Primary‑backup provides a simpler, controllable failover path when consistency constraints or cost considerations limit dual‑active deployment.
The same‑city centers handle daily production and high‑availability guarantees. They require synchronous or near‑synchronous data replication, unified configuration management, and automated monitoring/alerting to detect data‑center failures and trigger automatic or semi‑automatic switchover.
Remote Layer
Build a disaster‑recovery system in the remote location using one of three standby strategies:
Cold standby – minimal resources, longer recovery time.
Warm standby – moderate resources, faster recovery.
Hot standby – full‑capacity replica, near‑zero recovery time.
Key practices include periodic and incremental data replication, cross‑region consistency checks, and remote snapshot storage. The chosen strategy should align with the business’s Recovery Time Objective (RTO) and Recovery Point Objective (RPO), as well as available network bandwidth and storage cost. The remote center must have independent network and operations channels to avoid a single‑region failure affecting recovery capabilities.
<ol>
<li>全局流量调度(GSLB/DNS)</li>
<li>|</li>
<li>-----------------------------------------</li>
<li>| |</li>
<li>城市 A 城市 B</li>
<li>(同城双活)(异地灾备)</li>
<li>| |</li>
<li>负载均衡(SLB)灾备入口</li>
<li>|</li>
<li>-------------------------</li>
<li>| |</li>
<li>机房 A1 机房 A2</li>
<li>| |</li>
<li>应用集群 应用集群</li>
<li>| |</li>
<li>中间件集群(MQ/Redis)中间件集群</li>
<li>| |</li>
<li>主数据库 主数据库</li>
</ol>Key Practices
Use GSLB/DNS for global traffic distribution.
Employ SLB as the entry point for disaster‑backup traffic.
Implement synchronous or near‑synchronous replication between same‑city data centers.
Select the remote standby strategy based on RTO/RPO and cost considerations.
Ensure automated monitoring and independent network paths for rapid failover.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
