Cloud Native 20 min read

Achieving Zero‑Loss Service Deployment with Alibaba Cloud MSE

This guide explains how Alibaba Cloud's Microservice Engine (MSE) achieves zero‑loss service deployment by using adaptive offline waiting, active notifications, readiness‑aligned startup checks, and traffic warm‑up, providing step‑by‑step Kubernetes manifests, configuration details, and validation results to prevent traffic loss during releases.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Achieving Zero‑Loss Service Deployment with Alibaba Cloud MSE

Background

Large‑scale, high‑concurrency applications often schedule releases at night to avoid traffic loss, but this approach is uncontrolled and adds significant operational cost. Common causes of traffic loss include delayed service deregistration, slow initialization, premature registration, and misaligned readiness checks during Kubernetes rolling updates.

Zero‑Loss Offline Design

Alibaba Cloud Microservice Engine (MSE) introduces an adaptive waiting period and active notification mechanism. Before a service goes offline, MSE waits adaptively and sends an offline event to all consumers that have pending requests. Consumers receive the event and immediately pull the latest service list from the registry, preventing calls to the stopped instance.

Offline flow diagram
Offline flow diagram

Zero‑Loss Online Design

For online deployment, MSE addresses delayed loading and initialization latency. It ensures that service registration completes before Kubernetes readiness probes succeed, and provides a small‑traffic warm‑up feature that gradually ramps up traffic to a newly started instance, avoiding first‑call latency spikes that can cause time‑outs and crashes.

Online warm‑up chart
Online warm‑up chart
Warm‑up flow
Warm‑up flow
Warm‑up traffic ramp
Warm‑up traffic ramp

Demo Architecture

The demonstration uses a Spring Cloud ecosystem with a Zuul gateway, three backend services (A, B, C), and Nacos as the service registry. All components run on a Kubernetes cluster managed by Alibaba Cloud.

Demo deployment diagram
Demo deployment diagram

Step‑by‑Step Deployment

Prerequisites : Create a managed Kubernetes cluster and enable the MSE governance edition.

Prepare Manifest : Save the following YAML content as mse-demo.yaml and apply it with kubectl apply -f mse-demo.yaml. The manifest defines Nacos, Zuul, services A/B/C (both base and gray versions), CronHPA rules, and SLB services.

Enable MSE Governance : In the MSE console, navigate to Microservice Governance → Application List, select each application, and turn on the governance features.

# Nacos Server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nacos-server
  name: nacos-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nacos-server
  template:
    metadata:
      labels:
        app: nacos-server
    spec:
      containers:
      - env:
        - name: MODE
          value: standalone
        image: registry.cn-shanghai.aliyuncs.com/yizhan/nacos-server:latest
        imagePullPolicy: Always
        name: nacos-server
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
      dnsPolicy: ClusterFirst
      restartPolicy: Always
---
# Nacos Server Service
---
apiVersion: v1
kind: Service
metadata:
  name: nacos-server
spec:
  ports:
  - port: 8848
    protocol: TCP
    targetPort: 8848
  selector:
    app: nacos-server
  type: ClusterIP
---
# Zuul Gateway
---
apiVersion: apps/v1
metadata:
  name: spring-cloud-zuul
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spring-cloud-zuul
  template:
    metadata:
      annotations:
        msePilotAutoEnable: "on"
        msePilotCreateAppName: spring-cloud-zuul
      labels:
        app: spring-cloud-zuul
    spec:
      containers:
      - env:
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        - name: LANG
          value: C.UTF-8
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-zuul:1.0.1
        imagePullPolicy: Always
        name: spring-cloud-zuul
        ports:
        - containerPort: 20000
---
# Service A (base)
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-a
  labels:
    app: spring-cloud-a
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-a
  template:
    metadata:
      annotations:
        msePilotCreateAppName: spring-cloud-a
        msePilotAutoEnable: "on"
      labels:
        app: spring-cloud-a
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        - name: profiler.micro.service.tag.trace.enable
          value: "true"
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-a:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-a
        ports:
        - containerPort: 20001
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        livenessProbe:
          tcpSocket:
            port: 20001
          initialDelaySeconds: 10
          periodSeconds: 30
---
# Service A (gray)
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-a-gray
  labels:
    app: spring-cloud-a-gray
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-a-gray
  template:
    metadata:
      annotations:
        alicloud.service.tag: gray
        msePilotCreateAppName: spring-cloud-a
        msePilotAutoEnable: "on"
      labels:
        app: spring-cloud-a-gray
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        - name: profiler.micro.service.tag.trace.enable
          value: "true"
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-a:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-a-gray
        ports:
        - containerPort: 20001
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        livenessProbe:
          tcpSocket:
            port: 20001
          initialDelaySeconds: 10
          periodSeconds: 30
---
# Service B (base) – offline disabled
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-b
  labels:
    app: spring-cloud-b
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-b
  template:
    metadata:
      annotations:
        msePilotCreateAppName: spring-cloud-b
        msePilotAutoEnable: "on"
      labels:
        app: spring-cloud-b
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        - name: micro.service.shutdown.server.enable
          value: "false"
        - name: profiler.micro.service.http.server.enable
          value: "false"
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-b:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-b
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        livenessProbe:
          tcpSocket:
            port: 20002
          initialDelaySeconds: 10
          periodSeconds: 30
---
# Service B (gray) – offline enabled via preStop hook
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-b-gray
  labels:
    app: spring-cloud-b-gray
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-b-gray
  template:
    metadata:
      annotations:
        alicloud.service.tag: gray
        msePilotCreateAppName: spring-cloud-b
        msePilotAutoEnable: "on"
      labels:
        app: spring-cloud-b-gray
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-b:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-b-gray
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - '-c'
              - wget http://127.0.0.1:54199/offline 2>/tmp/null; sleep 30; exit 0
        livenessProbe:
          tcpSocket:
            port: 20002
          initialDelaySeconds: 10
          periodSeconds: 30
---
# Service C (base) – warm‑up enabled (120 s)
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-c
  labels:
    app: spring-cloud-c
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-c
  template:
    metadata:
      annotations:
        msePilotCreateAppName: spring-cloud-c
        msePilotAutoEnable: "on"
      labels:
        app: spring-cloud-c
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-c:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-c
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        livenessProbe:
          tcpSocket:
            port: 20003
          initialDelaySeconds: 10
          periodSeconds: 30
---
# CronHPA for service B (base)
---
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
  name: spring-cloud-b
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta2
    kind: Deployment
    name: spring-cloud-b
  jobs:
  - name: "scale-down"
    schedule: "0 0/5 * * * *"
    targetSize: 1
  - name: "scale-up"
    schedule: "10 0/5 * * * *"
    targetSize: 2
---
# CronHPA for service B (gray)
---
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
  name: spring-cloud-b-gray
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta2
    kind: Deployment
    name: spring-cloud-b-gray
  jobs:
  - name: "scale-down"
    schedule: "0 0/5 * * * *"
    targetSize: 1
  - name: "scale-up"
    schedule: "10 0/5 * * * *"
    targetSize: 2
---
# CronHPA for service C
---
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
  name: spring-cloud-c
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta2
    kind: Deployment
    name: spring-cloud-c
  jobs:
  - name: "scale-down"
    schedule: "0 2/5 * * * *"
    targetSize: 1
  - name: "scale-up"
    schedule: "10 2/5 * * * *"
    targetSize: 2
---
# Zuul Service (SLB exposure)
---
apiVersion: v1
kind: Service
metadata:
  name: zuul-slb
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 20000
  selector:
    app: spring-cloud-zuul
  type: ClusterIP
---
# Service A base Service
---
apiVersion: v1
kind: Service
metadata:
  name: spring-cloud-a-base
spec:
  ports:
  - name: http
    port: 20001
    protocol: TCP
    targetPort: 20001
  selector:
    app: spring-cloud-a
---
# Service A gray Service
---
apiVersion: v1
kind: Service
metadata:
  name: spring-cloud-a-gray
spec:
  ports:
  - name: http
    port: 20001
    protocol: TCP
    targetPort: 20001
  selector:
    app: spring-cloud-a-gray
---
# Nacos SLB Service
---
apiVersion: v1
kind: Service
metadata:
  name: nacos-slb
spec:
  ports:
  - port: 8848
    protocol: TCP
    targetPort: 8848
  selector:
    app: nacos-server
  type: LoadBalancer

Result Validation – No‑Loss Offline

During a simulated scaling event, the gray version of service B (offline feature enabled) recorded zero request errors, while the base version (offline feature disabled) produced 20 errors as pods were terminated, confirming that active notifications prevent traffic loss.

Offline validation chart
Offline validation chart

Result Validation – Service Warm‑up

When warm‑up was enabled for service C, traffic increased gradually after the instance restarted, and the system logged warm‑up start and end times. This behavior protected the application from resource exhaustion during the initial heavy‑load period.

Warm‑up validation chart
Warm‑up validation chart

References

Managed Kubernetes cluster documentation: https://help.aliyun.com/document_detail/95108.htm#task-skz-qwk-qfb

MSE governance enable guide: https://help.aliyun.com/document_detail/347625.htm#task-2140253

Zero‑Loss Offline live broadcast recording: https://yqh.aliyun.com/live/detail/27936

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesMSEService Warm‑upZero‑Loss Deployment
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.