Mastering High Availability: From Cold Backup to Multi‑Active Deployments
This article explains how backend services can be classified as stateless or stateful and explores a range of high‑availability strategies—from simple cold backups and active‑standby setups to same‑city, cross‑city, and multi‑active architectures—highlighting their trade‑offs and implementation considerations.
Backend services can be divided into two categories: stateless and stateful. High availability is relatively simple for stateless applications, which can be handled by load balancers or proxies. The following discussion focuses on stateful services, whose state is typically persisted on disk or in memory (e.g., MySQL, Redis), with JVM memory also possible but short‑lived.
High‑Availability Solutions
Historically, high‑availability has evolved through several stages:
Cold backup
Dual‑machine hot standby
Same‑city active‑active
Cross‑city active‑active
Cross‑city multi‑active
Cold Backup
Cold backup stops the database service and copies data files, typically using the cp command on Linux. It can be performed manually or via scheduled scripts. Benefits include simplicity, fast backup, quick recovery using mv, and point‑in‑time restoration.
Simple
Fast backup compared to other methods
Fast recovery by copying files back or adjusting configuration
Allows restoration to a specific point in time
However, cold backup has drawbacks:
Requires service downtime, which is unacceptable for high‑availability targets (e.g., 99.999% uptime)
Potential data loss between the backup point and restoration
Full‑volume backups waste disk space and are time‑consuming; selective table backup is not feasible
Dual‑Machine Hot Standby
Hot standby avoids downtime during backup but still requires a pause for restoration. Two main patterns exist:
Active/Standby Mode
One primary node serves traffic while a secondary node continuously replicates data (e.g., MySQL master/slave, SQL Server transactional replication). Upon failure, the standby becomes active.
Dual‑Machine Mutual Backup
Both machines act as primary for different services, enabling read‑write separation and better resource utilization. This pattern is similar to active/standby but applied at the server level.
Same‑City Active‑Active
This approach replicates services across data centers within the same city, protecting against a single IDC failure. It essentially extends dual‑machine hot standby to a larger geographic scope, with the same latency characteristics.
Cross‑City Active‑Active
When a whole city experiences a disaster, traffic is shifted to another city’s data center. Although this provides higher resilience, increased network latency can degrade user experience.
Cross‑City Multi‑Active
Each node operates as both read and write master, requiring conflict resolution mechanisms such as distributed locks, sharding, or eventual consistency techniques. While this maximizes availability, it introduces complexity in data synchronization and performance.
For applications with strict consistency requirements, a "Global Zone" can be used: writes are directed to a single master data center, while reads may be served from any slave or the master, ensuring strong consistency without dual‑write conflicts.
Reflection
How would you handle a user located at the intersection of four cities when sharding by province/city?
Which of your current business modules can be made multi‑active, and which cannot?
Should all services be multi‑active, or only core services?
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
