Common Disaster Recovery Models and How to Choose Them
The article outlines the main disaster‑recovery architectures—city‑level, remote, two‑site three‑center, and active‑active data centers—explains their characteristics, compares costs and performance, and presents key selection metrics such as RPO, RTO, disaster radius and ROI, illustrated with Huawei and ZTE case studies.
Disaster Recovery Modes
Common market offerings are grouped into four categories: same‑city (city‑level) DR, remote (different‑city) DR, two‑site three‑center architectures, and active‑active (dual‑active) data centers.
City‑Level DR
Two data centers are built within the same city or a nearby area (≤200 km). One serves as the production center, the other as a backup. The short distance enables high‑quality links and synchronous replication, providing near‑zero data loss. This model mitigates fire, building damage, power outages, system failures, or human error.
Remote DR
When the distance exceeds 200 km, asynchronous mirroring is typically used, which may incur a small amount of data loss. Remote DR protects against larger‑scale risks such as war, earthquakes, and floods. Organizations often combine a city‑level backup with a remote backup for optimal protection.
Local Dual‑Center DR
Implemented within the same data hall, both sites share workloads during normal operation and can switch over with minimal data loss during a disaster. Compared with remote DR, local dual‑center solutions have lower investment cost, faster deployment, simpler operations, and higher reliability.
Two‑Site Three‑Center Architecture
Inspired by recent large‑scale natural disasters, this pattern adds a remote backup center to a city‑level dual‑center setup, yielding three centers: a production center, a city‑level backup, and a remote backup. It combines high availability with robust disaster‑backup capability.
Active‑Active (Dual‑Active) Data Centers
In an active‑active configuration, two or more data centers run the same applications and hold identical data, providing load balancing and continuous availability. Benefits include full resource utilization and seamless failover invisible to users. The extensive changes required in underlying systems make true active‑active deployments rare in practice, especially in China.
Metrics for Choosing a DR Solution
Designing a DR system starts with selecting an appropriate data‑replication technique based on:
Disaster tolerance level : types of incidents the system must survive.
Business impact : maximum acceptable downtime, defining the tolerance window.
Data protection degree : whether full transaction recovery and real‑time synchronization are required.
Key quantitative indicators are:
RPO (Recovery Point Objective) : maximum data loss allowed, directly tied to the replication method.
RTO (Recovery Time Objective) : total time needed to restore services, including data restore, application switch, and network reconnection. For example, a 12‑hour RTO means services must be back within half a day after a disaster.
Disaster radius : straight‑line distance between production and backup sites, reflecting the geographic scope of protection.
ROI (Return on Investment) : cost‑benefit ratio of the DR solution.
Ideal solutions would have zero RPO, zero RTO, and a large disaster radius, but performance, technology, and cost constraints make this unattainable. Decision‑makers must balance disaster probability, data criticality, technical options, and budget.
DR Levels
Data‑level DR : only production data is replicated to the backup site. Recovery may exceed 24 hours, but cost is low.
Application‑level DR : builds a full backup environment (hosts, network, applications, IP) that can take over within about 12 hours. Complexity and cost are higher.
Business‑level DR : both production and backup sites process business requests simultaneously, achieving RTO under 30 minutes. This offers the highest availability but requires extensive application redesign and incurs the highest operational cost.
Architectural Practices
For IT enterprises, a single data center no longer guarantees data safety. Local redundancy cannot protect against regional disasters, while remote DR faces challenges such as high network‑link leasing fees and limited bandwidth.
Huawei’s Two‑Site Three‑Center Solution
Based on Huawei unified storage with multi‑level jump replication, the architecture deploys OceanStor storage at the production center, the city‑level backup, and the remote backup. Asynchronous remote replication copies data from production to the city backup and then to the remote site. In a disaster, the city backup can take over; if both production and city backup fail, the remote backup assumes the workload.
ZTE’s Distributed Active‑Active Data Center
Leveraging cloud‑computing IaaS and PaaS layers, ZTE builds two data centers in the same city for active‑active service and a remote disaster‑recovery center for backup. This architecture improves investment utilization and ensures business continuity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
