Inside Alibaba’s Same‑City Active‑Active Architecture: A Complete Visual Guide
The article breaks down Alibaba’s same‑city active‑active high‑availability architecture, detailing its four design layers—traffic scheduling, stateless application services, data replication, and operational automation—while illustrating how each component ensures continuous service during data‑center failures.
Distributed systems are the backbone of large‑scale architectures; this article explains Alibaba’s same‑city active‑active (dual‑active) design.
In Alibaba’s high‑availability framework, same‑city active‑active refers to two relatively independent data centers located in the same city that simultaneously handle production traffic. If either center fails, the other can quickly take over, minimizing service interruption. Unlike traditional active‑passive setups, both sites remain fully operational and share load.
Alibaba’s core services such as Taobao, Tmall, and Alipay have evolved from single‑datacenter, primary‑backup disaster recovery to same‑city active‑active and eventually to multi‑city active architectures.
The same‑city active‑active architecture is built around four layers: traffic scheduling, data synchronization, stateless application design, and fault‑tolerant operations.
1. Traffic Layer (Active‑Active)
User requests first pass through a global traffic scheduling system (DNS/GSLB) that directs them to the nearest or a designated data center. Both data centers can process external requests, and traffic is split based on health checks, load conditions, and business policies. This balances pressure and enables rapid traffic migration when a center experiences an anomaly.
2. Application Layer (Active‑Active)
Application services are designed to be stateless or to have minimal state dependencies. Session data, caches, and configuration are externalized to unified storage or distributed components, ensuring that any center can handle a request without relying on local state.
3. Data Layer (Active‑Active)
This is the most challenging layer. Alibaba combines multi‑replica databases, distributed storage, message queues, and asynchronous replication to keep data consistent—or eventually consistent—across the two centers. Scenarios demanding strong consistency use stricter transaction control and arbitration mechanisms, while workloads tolerant of eventual consistency prioritize availability and throughput.
4. Operations & Disaster‑Recovery Layer
Automation handles monitoring, alerting, disaster‑recovery drills, and fault‑switching mechanisms. When an abnormality is detected, the system quickly locates the issue and redirects traffic, reducing manual intervention. Regular DR drills verify that the dual‑active link is truly usable, which is essential for practical deployment.
用户 │ DNS / GSLB调度 ↙ ↘ 机房<span>A</span> 机房<span>B</span> APP APP Redis Redis MQ MQ DB DB ↔ 数据同步 ↔The architecture is not a simple copy of systems across two data centers; it is a coordinated design that ensures continuous service, balanced load, and data integrity through the four layers described above.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
