Mastering Backend High Availability: From Cold Backups to Multi‑Active Deployments

This article examines stateful backend services and compares various high‑availability strategies—including cold backup, dual‑machine hot standby, same‑city and cross‑city active‑active, and multi‑active architectures—highlighting their benefits, drawbacks, and practical implementation considerations.

21CTO
21CTO
21CTO
Mastering Backend High Availability: From Cold Backups to Multi‑Active Deployments

Backend services can be divided into two categories: stateless and stateful. High availability is straightforward for stateless applications using load balancers, while this article focuses on stateful services.

State is typically persisted on disk or in memory (e.g., MySQL, Redis). JVM memory can also hold state but its lifecycle is short.

High‑Availability Solutions

Historically, high‑availability has evolved through several stages:

Cold backup

Dual‑machine hot standby

Same‑city active‑active

Cross‑city active‑active

Cross‑city multi‑active

Cold Backup

Cold backup copies data files (e.g., using the cp command) and can be triggered manually or via scripts. Advantages include simplicity, fast backup and restore, and point‑in‑time recovery. Drawbacks are required downtime, potential data loss between backup and restore, full‑copy overhead, and lack of selective backup.

Dual‑Machine Hot Standby

Hot standby provides continuous service while replicating data, but restoration still requires downtime. Two main modes exist:

Active/Standby : One primary node serves traffic; a secondary node replicates data (software‑level replication such as MySQL master/slave or SQL Server transactional replication, or hardware‑level mirroring).

Dual‑Machine Active‑Active (mutual standby) : Both nodes act as primary for different services, enabling read‑write separation and better resource utilization.

Other HA patterns include database‑specific deployments like MySQL master‑slave, master‑master, MHA, or Redis master‑slave, Sentinel, and Cluster.

Same‑City Active‑Active

This approach extends hot standby across data centers within the same city, mitigating failures of an entire IDC (e.g., power outage). It resembles dual‑machine hot standby but with greater geographic separation, requiring fast inter‑city links.

Traffic is load‑balanced to services in each IDC; data sync occurs via dedicated links, and failover redirects traffic to the surviving IDC. If both city‑level IDC1 and IDC2 fail, a remote IDC3 preserves data, though latency may degrade user experience.

Cross‑City Active‑Active

To handle large‑scale disasters, services are deployed in separate cities with independent front‑ends and back‑ends. Upon a city‑wide outage, traffic is redirected to the remote city, sacrificing latency for continuity.

Cross‑city active‑active introduces higher synchronization latency, potential throughput reduction, and data conflicts, which can be mitigated with distributed locks, sharding, or eventual consistency mechanisms.

Cross‑City Multi‑Active

In a multi‑active topology each node connects to four others, ensuring that any single node failure does not impact service. However, increased write latency and conflict risk demand sophisticated conflict‑resolution strategies such as distributed transactions or sharding.

Alibaba’s “Global Zone” offers a strong consistency solution by directing all writes to a master data center while allowing reads from any zone, eliminating write‑write conflicts.

For applications requiring strict consistency, a “Global Zone” enforces a single write master per region, with reads served locally, ensuring data integrity without active‑active writes.

Reflection Questions

How would you handle a user located at the intersection of four cities in a sharded, cross‑city multi‑active deployment?

Which of your current business modules can be made multi‑active, and which cannot?

Should all services be multi‑active, or only core services?

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend Architecturehigh availabilitydisaster recoverymulti-activestateful servicescold backup
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.