How Do Large Internet Companies Achieve Cross‑Region Multi‑Active High Availability?
The article explains why large internet firms adopt cross‑region multi‑active architectures for high availability, compares cold backup, hot standby, same‑city active‑active, and cross‑region active‑active solutions, discusses their trade‑offs, and presents practical design patterns and questions for implementing such systems.
Stateful vs Stateless Services
Backend services can be divided into stateless and stateful. Stateless services achieve high availability simply through load balancers, while stateful services maintain state on disk or memory (e.g., MySQL, Redis) and require more complex solutions.
High‑Availability Solutions
High‑availability has evolved through several stages:
Cold Backup
Hot Standby
Same‑City Active‑Active
Cross‑Region Active‑Active
Cross‑Region Multi‑Active
Cold Backup
Cold backup copies data files (e.g., using cp) while the service is stopped. It is simple, fast to backup and restore, and can recover to a specific point in time, but it requires downtime, may lose data between backup and restore, and involves full‑volume copies that waste storage.
Hot Standby
Hot standby (Active/Standby) keeps a primary node serving traffic and a backup node synchronizing data in real time. Synchronization can be software‑based (e.g., MySQL master/slave, SQL Server replication) or hardware‑based (disk mirroring). The backup can take over when the primary fails, but the failover still requires a brief outage.
Same‑City Active‑Active
Same‑city active‑active deploys two independent clusters in the same metropolitan area. It provides faster failover than hot standby and enables read‑write separation, but it does not protect against large‑scale disasters that affect the whole city.
Cross‑Region Active‑Active
Cross‑region active‑active adds a distant disaster‑recovery site. Traffic is load‑balanced between two cities; if one city fails, traffic is shifted to the other. However, long‑distance synchronization increases latency and may cause data conflicts.
Cross‑Region Multi‑Active
Multi‑active removes single points of failure by connecting every node to every other node (a mesh network). Each node has four inbound and outbound connections, so any node can fail without affecting the service. The trade‑off is higher write latency and increased risk of conflicts, which may require distributed locks or sharding strategies.
Design Patterns and Real‑World Examples
Companies such as Eleme, Alibaba, and Taobao adopt variations of these architectures. Eleme uses a “Global Zone” to enforce strong consistency for critical services. Alibaba’s ideal multi‑active design separates write traffic to a master region and reads to slaves. Taobao partitions data by business units, synchronizing core units bidirectionally and peripheral units unidirectionally.
Considerations
Implementing multi‑active systems demands strong foundations: reliable data transfer, validation, and client‑side control of writes and synchronization. It also raises challenges in testing, automation, and operational readiness.
Thought Questions
If a user’s location spans multiple cities, how would you route requests to obtain consistent data?
Which of your current services can be made multi‑active and which cannot?
Is multi‑active required for all services or only for core business functions?
References
Eleme “Cross‑Region Multi‑Active Technical Implementation (Part 1) – Overview”
Eleme Architecture Blog
Alibaba “Cross‑Region Multi‑Active and Same‑City Active‑Active Architecture Evolution”
Alibaba Cloud “Database Cross‑Region Multi‑Active Solution”
“Cross‑Region Multi‑Active Is Not That Hard”
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
