Cloud Native 14 min read

How to Build Minute‑Level Hybrid Cloud Disaster Recovery with MSHA Multi‑Active Architecture

This article presents a step‑by‑step guide for constructing a hybrid cloud disaster‑recovery solution using MSHA's multi‑active architecture, covering business background, design challenges, dual‑active deployment, traffic routing, data synchronization, one‑click failover, and validation of sub‑minute RPO/RTO for e‑commerce platforms.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Build Minute‑Level Hybrid Cloud Disaster Recovery with MSHA Multi‑Active Architecture

01 Introduction

More enterprises choose a hybrid‑cloud model (cloud + self‑built IDC or other cloud providers) for disaster recovery, avoiding over‑reliance on a single vendor while leveraging existing IDC resources.

02 Business Hybrid‑Cloud Disaster Recovery Practice

Business background : A retail e‑commerce platform (Company A) runs its systems in a self‑built IDC and faces lack of disaster‑recovery capability, insufficient IDC capacity, and long hardware upgrade cycles. To improve resilience without fully depending on the cloud, the company plans an IDC + cloud hybrid architecture.

Current Application Deployment

frontend: Web application for user interaction.

cartservice: Shopping‑cart add, store, and query.

productservice: Product and inventory services.

Technology stack: SpringBoot, RPC frameworks SpringCloud/Dubbo with Nacos/Zookeeper registries, Redis and MySQL databases.

03 Hybrid‑Cloud Disaster Recovery Goals

Bidirectional disaster recovery with minute‑level RTO. Switch between cloud and IDC within 10 minutes.

No data consistency risk. Ensure strong consistency between the two data centers during normal operation and failover.

One‑stop management. Unified control of traffic, databases, and failover through a single console.

Short implementation cycle and low cost. Minimal code changes for fast business iteration.

04 Challenges

High difficulty in traffic management (DNS propagation delays).

Complexity of Redis/MySQL disaster‑recovery across IDC and cloud.

Data quality assurance during failover (dirty writes, sync lag).

Need to avoid invasive business‑code changes.

05 Solution

Adopt an application dual‑active architecture using MSHA to meet the above requirements.

Application Dual‑Active Architecture

Deploy applications, middleware, and databases symmetrically in both IDC and cloud regions (e.g., Alibaba Cloud Hangzhou Region within ≤200 km latency). Use MSHA access‑layer clusters (MSFE) to handle HTTP/HTTPS traffic and provide proportionate or rule‑based routing.

Detailed Implementation

Application traffic dual‑active : Symmetric deployment, MSHA access layer for traffic splitting, MSFE console for scaling, monitoring, and minute‑level cut‑over.

Service inter‑communication and same‑unit priority calls : MSHA registry sync enables cloud‑IDC service interaction; MSHA‑Agent ensures Dubbo/SpringCloud consumers prefer same‑unit providers to reduce latency.

Data synchronization & database connection switching : Asynchronous replication between cloud and IDC Redis/RDS; MSHA‑Agent provides connection switching with write‑protection to avoid dirty reads/writes.

One‑stop management & zero code intrusion : MSHA console unifies HTTP and database traffic control; MSHA‑Agent adds disaster‑recovery capability without modifying business code.

06 Transformation Steps

Application migration : Deploy a full redundant set of applications, middleware, and databases in a nearby Alibaba Cloud region (e.g., Hangzhou).

Network connectivity : Use CEN (Cloud Enterprise Network) to achieve IDC‑cloud interconnection.

Cluster deployment and configuration : Deploy MSHA access‑layer clusters (MSFE) with SLB for public entry and load balancing; configure domain, URI, and backend addresses for traffic splitting.

Application setup : Install MSHA‑Agent on Java services, use Nacos for command delivery, enabling same‑unit priority calls and DB connection switching.

Middleware and database : Deploy managed ZK/Nacos, cloud Redis, and RDS with cross‑AZ high availability; configure data sync between cloud and IDC databases.

07 Disaster‑Recovery Capability

RPO ≤ 1 min (depends on DTS sync performance).

RTO ≤ 1 min (MSHA component provides sub‑second cut‑over, overall ≤ 1 min).

08 Validation

Use the Chaos fault‑injection product to simulate failures in the Beijing IDC unit (application, Redis, MySQL). Verify that traffic is automatically shifted to the Hangzhou unit, databases are switched, and the e‑commerce demo recovers within the expected time.

Steps include monitoring via MSHA console, executing one‑click cut‑zero for traffic, performing pre‑check and confirming cut‑over, and confirming successful recovery after database primary‑standby switch.

09 Summary

This article demonstrated how MSHA multi‑active disaster recovery assists enterprises in building hybrid‑cloud dual‑active architectures, providing a practical implementation guide and using Chaos fault‑injection to validate that the solution meets sub‑minute RPO/RTO requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud-nativedisaster recoverymulti-activeAlibaba Cloudhybrid cloudMSHA
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.