Operations 14 min read

Mastering High Availability: From Cold Backup to Multi‑Active Deployments

This article explains how backend services can be classified as stateless or stateful and explores a range of high‑availability strategies—from simple cold backups and active‑standby setups to same‑city, cross‑city, and multi‑active architectures—highlighting their trade‑offs and implementation considerations.

Programmer DD

Nov 30, 2020

Mastering High Availability: From Cold Backup to Multi‑Active Deployments

Backend services can be divided into two categories: stateless and stateful. High availability is relatively simple for stateless applications, which can be handled by load balancers or proxies. The following discussion focuses on stateful services, whose state is typically persisted on disk or in memory (e.g., MySQL, Redis), with JVM memory also possible but short‑lived.

High‑Availability Solutions

Historically, high‑availability has evolved through several stages:

Cold backup

Dual‑machine hot standby

Same‑city active‑active

Cross‑city active‑active

Cross‑city multi‑active

Cold Backup

Cold backup stops the database service and copies data files, typically using the cp command on Linux. It can be performed manually or via scheduled scripts. Benefits include simplicity, fast backup, quick recovery using mv, and point‑in‑time restoration.

Simple

Fast backup compared to other methods

Fast recovery by copying files back or adjusting configuration

Allows restoration to a specific point in time

However, cold backup has drawbacks:

Requires service downtime, which is unacceptable for high‑availability targets (e.g., 99.999% uptime)

Potential data loss between the backup point and restoration

Full‑volume backups waste disk space and are time‑consuming; selective table backup is not feasible

Dual‑Machine Hot Standby

Hot standby avoids downtime during backup but still requires a pause for restoration. Two main patterns exist:

Active/Standby Mode

One primary node serves traffic while a secondary node continuously replicates data (e.g., MySQL master/slave, SQL Server transactional replication). Upon failure, the standby becomes active.

Dual‑Machine Mutual Backup

Both machines act as primary for different services, enabling read‑write separation and better resource utilization. This pattern is similar to active/standby but applied at the server level.

Same‑City Active‑Active

This approach replicates services across data centers within the same city, protecting against a single IDC failure. It essentially extends dual‑machine hot standby to a larger geographic scope, with the same latency characteristics.

Cross‑City Active‑Active

When a whole city experiences a disaster, traffic is shifted to another city’s data center. Although this provides higher resilience, increased network latency can degrade user experience.

Cross‑City Multi‑Active

Each node operates as both read and write master, requiring conflict resolution mechanisms such as distributed locks, sharding, or eventual consistency techniques. While this maximizes availability, it introduces complexity in data synchronization and performance.

For applications with strict consistency requirements, a "Global Zone" can be used: writes are directed to a single master data center, while reads may be served from any slave or the master, ensuring strong consistency without dual‑write conflicts.

Reflection

How would you handle a user located at the intersection of four cities when sharding by province/city?

Which of your current business modules can be made multi‑active, and which cannot?

Should all services be multi‑active, or only core services?

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

high availability disaster recovery multi-active backend services stateful architecture

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.