Cloud Native 15 min read

Regional Disaster Recovery Architecture Using ASM Service Mesh and GTM

This guide explains how to design and implement a multi‑region disaster‑recovery solution on Alibaba Cloud by deploying identical Kubernetes clusters, configuring ASM ingress gateways with global traffic manager (GTM) for automatic failover, enabling intra‑cluster traffic retention, and validating the setup with load‑testing tools.

Alibaba Cloud Infrastructure

Jan 6, 2025

Regional Disaster Recovery Architecture Using ASM Service Mesh and GTM

Regional‑level failures can occur due to natural disasters, network outages, human error, or security incidents, causing all zones in a region to lose connectivity, data, or workload availability.

The ASM service mesh can deploy ingress gateways in each Kubernetes cluster (or ECI) and, together with Alibaba Cloud DNS and Global Traffic Manager (GTM), split traffic between two regions under normal conditions and automatically remove the faulty region’s IP from DNS to redirect all traffic to the healthy region.

To validate this approach, prepare two Kubernetes clusters (e.g., cluster-1 and cluster-2) in different regions, deploy identical cloud‑native services, expose each cluster’s ASM gateway via a public CLB IP, and configure DNS to resolve a single domain name to both IPs.

Step 1: Create two ACK clusters with EIP‑exposed API servers in separate regions. Step 2: Build a multi‑master control‑plane service mesh by creating two ASM instances (mesh‑1 and mesh‑2) and joining each cluster to its respective mesh. Step 3: Deploy an ASM ingress gateway and the Bookinfo demo application in each cluster, then create gateway rules and virtual services to expose the app. Step 4: Enable the ASM intra‑cluster traffic‑retention feature so that traffic stays within a cluster unless the whole region fails. Step 5: Configure GTM to perform health‑check‑based IP failover, ensuring that when one region’s gateway is removed, all traffic is routed to the remaining healthy gateway. Step 6 (optional): Apply a local rate‑limiting policy to each ingress gateway using the following YAML:

apiVersion: istio.alibabacloud.com/v1beta1
kind: ASMLocalRateLimiter
metadata:
  name: ingressgateway
  namespace: istio-system
spec:
  configs:
    - limit:
        fill_interval:
          seconds: 1
        quota: 100
      match:
        vhost:
          name: '*'
          port: 80
          route:
            name_match: gw-to-productage
  isGateway: true
  workloadSelector:
    labels:
      istio: ingressgateway

To test the disaster‑recovery flow, use the fortio load‑testing tool to generate traffic against the domain, then simulate a regional failure by deleting the ingress gateway workload in one cluster. The test shows most requests succeed and traffic is automatically shifted to the healthy region, confirming the ASM‑GTM integration works as intended.

Health checks in GTM automatically remove the failed IP, and alerts can be configured to notify operators for manual intervention if needed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native GTM disaster-recovery service-mesh

Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.