Multi-Active High Availability Architecture: Scenarios, Solutions, and Evaluation
Multi‑active high‑availability architectures—ranging from same‑city dual‑active and two‑site three‑center setups to fully remote multi‑active deployments—provide continuous 24/7 service by replicating data across sites, but introduce latency, consistency, routing, and cost complexities that require careful unit‑based design, synchronized storage, and sophisticated traffic management.
High‑availability system architectures that provide 7×24 uninterrupted service have become a primary choice for enterprises to ensure stable and continuous operation of critical business. Multi‑active deployment is a key implementation method, and this article introduces common multi‑active approaches such as same‑city dual‑active, two‑site three‑center, and remote multi‑active designs, detailing their advantages and disadvantages.
Why adopt multi‑active? As mobile internet scales, enterprises face high concurrency and massive data volumes that exceed the capacity of a single data center. Catastrophic events (power loss, fire, earthquakes, etc.) can cause total service outage. Restoring a backup data center can be time‑consuming, so multi‑active architectures are used to improve business continuity and risk resistance.
Multi‑active scenarios involve systems in different geographic locations providing services simultaneously. While this eliminates the “cold standby” concept, it introduces complexity, latency, and higher cost. Not every system needs multi‑active; internal IT systems or blogs may forgo it, whereas core financial, payment, or transaction services typically require it.
1. Same‑city dual‑active
Two data centers are built within the same city or nearby area. High‑quality network links enable real‑time data replication, ensuring zero data loss. Traffic is split between the two sites, with most intra‑site RPC calls staying local. In case of a failure in one site, traffic is rerouted via GSLB or manual routing.
Service routing
ZooKeeper clusters: Each site runs a ZooKeeper cluster with bidirectional real‑time sync, so every site has the full registration data.
Routing policy: Conditional routing → proximity routing → cross‑site routing, minimizing cross‑site calls.
Subscription model: Consumers subscribe to all sites; providers register only to the local ZooKeeper cluster.
Data dual‑active
MySQL: MHA deployment with semi‑synchronous master‑slave to guarantee consistency; read‑write separation with reads routed locally.
Redis: Redis Cluster with master‑slave sync; reads/writes routed to the local master. Native master‑slave across sites has low write performance, so CRDT‑based multi‑node sync may be used, though it adds complexity.
Evaluation of same‑city dual‑active
Advantages
Provides city‑level disaster recovery with no data loss.
Architecture is relatively simple; close proximity allows synchronous MySQL replication.
Disadvantages
Cross‑site writes increase latency and affect performance.
Only protects against city‑level failures; regional or nationwide disasters still pose risk.
Large‑scale deployments may hit connection limits on a single master DB.
2. Two‑site three‑center architecture
This combines same‑city dual‑center with an additional remote disaster‑recovery center that holds cold backups. When both city centers fail, the remote center can restore services from its backup.
Evaluation
Shares the same advantages as same‑city dual‑active.
Adds a remote backup to guard against simultaneous city‑wide failures.
Disadvantages mirror those of same‑city dual‑active plus the cold‑site recovery delay.
3. Remote multi‑active
Multiple geographically dispersed sites provide services concurrently. The key challenges include latency, data consistency, routing, and synchronization.
Challenges
Physical distance introduces high latency, especially for cross‑site writes that require strong consistency.
Data must be isolated per “unit” (RZone) to avoid cross‑unit conflicts.
Requests need accurate routing to the correct unit (e.g., when user A transfers to user B located in a different unit).
Synchronizing data across units while preserving isolation.
Unit‑based architecture (RZone)
Each unit contains all services and data needed to fulfill its business, and every node belongs to exactly one unit. Traffic is first routed to an entry gateway that determines the user’s unit and forwards the request accordingly.
Request routing
API gateway clusters in each site detect the global traffic split and forward requests to the appropriate unit.
Cookies can carry routing identifiers to reduce repeated HTTP forwarding.
Service‑level routing (RPC, MQ, DB) must support unit‑aware addressing.
Code example for RPC routing in a multi‑active environment
public interface ManualInterventionFacade {
@ZoneRoute(zoneType= ZoneType.RZone, uidClass = UidParseClass.class)
ManualRecommendResponse getManualRecommendCommodity(ManualRecommendRequest request);
}Data synchronization types
QZone data: Eventually consistent data; each site holds a full copy and synchronizes increments.
MZone data: Strongly consistent data; deployed with same‑city dual‑active, tolerating cross‑site latency.
RZone data: Each zone has its own primary; writes outside the zone are routed to the appropriate primary.
Overall evaluation of remote multi‑active
Advantages : Significantly improves disaster‑recovery capability, enables horizontal scaling across regions, and limits the impact of regional failures.
Disadvantages : Architecture is complex, deployment and operation costs are high, and the approach introduces intrusion into business logic (routing identifiers, unit awareness).
Conclusion
The article reviews the main ideas and key technical points of multi‑active deployment, comparing various schemes. Building a complete remote multi‑active capability requires extensive modifications to middleware, storage, traffic scheduling, and operational control. Detailed discussions of storage‑level replication (e.g., MySQL, Redis) are omitted but are essential for deeper study.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.