How to Build Minute‑Level Hybrid Cloud Disaster Recovery with MSHA Multi‑Active Architecture
This guide walks through a hybrid cloud disaster‑recovery demo for an e‑commerce platform, detailing business background, requirements, challenges, the MSHA‑based active‑active solution, step‑by‑step implementation, and verification of sub‑minute RPO/RTO using Alibaba Cloud services and Chaos engineering.
Business Background
Retail e‑commerce platform deployed only in a self‑built IDC, no disaster‑recovery (DR) capability.
IDC capacity is insufficient and hardware upgrades are slow.
Goal: create a hybrid cloud (IDC + Alibaba Cloud) DR solution with minimal code changes.
Current Architecture
Applications: frontend (web UI), cartservice, productservice. Stack: SpringBoot, RPC via SpringCloud/Dubbo, service registry Nacos/Zookeeper, Redis and MySQL databases.
Hybrid DR Objectives
Bidirectional failover with RTO < 10 min (target < 1 min).
Strong data consistency; no dirty writes.
Unified management of traffic, services, and databases.
Short implementation cycle, low refactor cost.
Key Challenges
DNS‑based traffic split has long propagation time.
Synchronizing Redis/MySQL between IDC and cloud is complex.
Ensuring data quality during failover (lag, dirty writes).
Minimizing intrusion into business code.
Solution Overview
Adopt an active‑active (双活) architecture using Alibaba Cloud MSHA (Multi‑Site High‑Availability).
Active‑Active Architecture
Deploy symmetric workloads in the IDC and a cloud region within 200 km (latency ~5‑7 ms). Both sites run identical applications, middleware, and databases. Traffic is split at the MSHA access layer (MSFE) by weight or precise routing rules, enabling minute‑level cut‑over.
Select a cloud region ≤200 km from the IDC.
Deploy applications and middleware symmetrically (double‑active).
Use asynchronous DB replication; each site reads/writes its local DB to avoid consistency issues.
Detailed Implementation Steps
Establish IDC‑cloud connectivity via Alibaba Cloud CEN.
Deploy MSHA access clusters (MSFE) with SLB for HTTP/HTTPS entry.
Configure domain/URI routing in the MSHA console for traffic split and rapid cut‑over.
Synchronize service registries (Nacos/Zookeeper) through MSHA registry sync for cross‑site discovery.
Install MSHA‑Agent on Java services (no code changes) to prioritize same‑unit calls and enable DB connection switching.
Deploy cloud‑managed ZK/Nacos, Redis, and RDS with cross‑AZ high availability.
Configure Alibaba Cloud DTS for asynchronous data sync between cloud and IDC databases.
Post‑Migration Architecture
Both IDC and cloud serve live traffic; requests are randomly routed to Beijing or Hangzhou units, all reading/writing the Beijing DB in normal operation.
Disaster‑Recovery Validation
Application Fault Injection
Open the Alibaba Cloud Chaos console, select the Beijing product service, and run a 50 % network‑packet‑loss scenario.
Observe degraded access on the e‑commerce homepage, confirming failover behavior.
Traffic Cut‑Over
Use MSHA “One‑click cut‑zero” to route 100 % of traffic to Hangzhou, verifying normal operation.
Database Fault Injection & Switch
Inject failures into Beijing Redis/MySQL via Chaos, then use MSHA DB switch to redirect connections to Hangzhou databases. The process includes pre‑check, write‑lock during sync, and final verification.
DR Metrics
RPO ≤ 1 min (depends on DTS sync speed).
RTO ≤ 1 min (MSHA provides sub‑second cut‑over; overall ≤ 1 min).
Conclusion
MSHA enables rapid, low‑cost hybrid cloud disaster recovery with minute‑level RPO/RTO, minimal code changes, and unified management, validated through real‑world fault injection.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
