Operations 14 min read

Mastering High Availability: From Cold Backups to Multi-Region Active-Active

This article examines high‑availability strategies for stateful backend services, comparing cold backup, hot standby, same‑city active‑active, cross‑region active‑active, and multi‑active architectures, highlighting their advantages, limitations, and practical implementation considerations such as downtime, data loss, synchronization overhead, and conflict resolution.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Mastering High Availability: From Cold Backups to Multi-Region Active-Active

Preface

Backend services can be divided into two categories: stateful and stateless. High availability is relatively simple for stateless applications, which can be solved by using an F5 or any proxy. The following discussion focuses on stateful services.

State maintenance on the server side is mainly stored on disk or memory, such as MySQL databases, Redis, etc. Besides these, JVM memory can also hold state, but its lifecycle is usually short.

High Availability

1. Some High‑Availability Solutions

High availability has evolved through several stages:

Cold backup

Active/standby hot backup

Same‑city active‑active

Cross‑region active‑active

Cross‑region multi‑active

Before discussing cross‑region multi‑active, it is useful to review other solutions to understand design motivations.

Cold Backup

Cold backup stops the database service and copies data files for rapid backup, essentially a copy‑paste operation that can be done with the Linux cp command, manually or via scheduled scripts. Benefits include simplicity, fast backup compared to other methods, quick recovery by copying files back, and point‑in‑time restoration.

However, cold backup has drawbacks: it requires service downtime, can lead to data loss between backup and restore, performs full backups consuming disk space and time, and cannot selectively back up specific tables.

Balancing these pros and cons depends on business requirements.

Active/Standby Hot Backup

Hot backup differs from cold backup by allowing service to continue while backing up, though restoration still requires downtime. The discussion excludes shared‑disk approaches.

Active/Standby Mode

Equivalent to a primary‑secondary setup: the primary node serves traffic, the secondary acts as backup. Data is synchronized from primary to secondary via software (e.g., MySQL master/slave binlog replication, SQL Server transactional replication) or hardware (disk mirroring, sector interception). Software‑level is often called application‑level disaster recovery; hardware‑level is data‑level disaster recovery.

Bidirectional Hot Backup

Essentially Active/Standby with roles swapped, allowing each node to act as primary for different services, enabling read‑write separation and improved resource utilization.

Other HA options include various database deployment modes such as MySQL master‑slave, master‑master, MHA; Redis master‑slave, Sentinel, Cluster, etc.

Same‑City Active‑Active

These solutions operate within a single LAN but extend to multiple data centers in the same city, protecting against an entire IDC failure (power outage, network loss). The architecture is similar to hot backup but with greater distance, typically using dedicated city‑level links.

With code assistance, some services can achieve true active‑active operation, providing read‑write on both nodes while handling conflicts, though not all workloads can support this.

Industry practice often adopts a “two‑site three‑center” model: two local data centers provide primary service, while a third remote center serves as disaster‑recovery only, activated when a local IDC fails.

As shown, user traffic is load‑balanced to IDC1 and IDC2; data is synchronized between them and to a distant IDC3. If any IDC fails, traffic fails over to the remaining local IDC, or to IDC3 if both local sites are down, though latency may increase.

Cross‑Region Active‑Active

Same‑city active‑active cannot handle large‑scale disasters; cross‑region active‑active deploys front‑end entry points and applications in another city, routing traffic there when the primary city fails, albeit with degraded user experience.

Most internet companies adopt cross‑region active‑active solutions.

Cross‑Region Multi‑Active

Extending the active‑active concept, multi‑active architectures connect multiple nodes with full‑mesh topology, ensuring any single node failure does not affect service. However, increased distance introduces higher latency, throughput reduction, and data conflicts.

To mitigate conflicts, techniques such as distributed locks, distributed transactions, sharding, and eventual consistency are employed. Some companies, like Ele.me, use a “Global Zone” approach where writes are directed to a single master zone for strong consistency, while reads can be served from slaves.

Multi‑active is essentially a temporary step toward full multi‑active; it simplifies architecture but cannot scale horizontally and still faces conflict challenges.

Implementing true cross‑region multi‑active requires substantial foundational capabilities: data transfer, verification, operation layers, etc.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend Architecturehigh availabilitymulti-activeActive-Activestateful services
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.