Achieving Zero‑Loss Service Deployment with Alibaba Cloud MSE
This guide explains how Alibaba Cloud's Microservice Engine (MSE) achieves zero‑loss service deployment by using adaptive offline waiting, active notifications, readiness‑aligned startup checks, and traffic warm‑up, providing step‑by‑step Kubernetes manifests, configuration details, and validation results to prevent traffic loss during releases.
Background
Large‑scale, high‑concurrency applications often schedule releases at night to avoid traffic loss, but this approach is uncontrolled and adds significant operational cost. Common causes of traffic loss include delayed service deregistration, slow initialization, premature registration, and misaligned readiness checks during Kubernetes rolling updates.
Zero‑Loss Offline Design
Alibaba Cloud Microservice Engine (MSE) introduces an adaptive waiting period and active notification mechanism. Before a service goes offline, MSE waits adaptively and sends an offline event to all consumers that have pending requests. Consumers receive the event and immediately pull the latest service list from the registry, preventing calls to the stopped instance.
Zero‑Loss Online Design
For online deployment, MSE addresses delayed loading and initialization latency. It ensures that service registration completes before Kubernetes readiness probes succeed, and provides a small‑traffic warm‑up feature that gradually ramps up traffic to a newly started instance, avoiding first‑call latency spikes that can cause time‑outs and crashes.
Demo Architecture
The demonstration uses a Spring Cloud ecosystem with a Zuul gateway, three backend services (A, B, C), and Nacos as the service registry. All components run on a Kubernetes cluster managed by Alibaba Cloud.
Step‑by‑Step Deployment
Prerequisites : Create a managed Kubernetes cluster and enable the MSE governance edition.
Prepare Manifest : Save the following YAML content as mse-demo.yaml and apply it with kubectl apply -f mse-demo.yaml. The manifest defines Nacos, Zuul, services A/B/C (both base and gray versions), CronHPA rules, and SLB services.
Enable MSE Governance : In the MSE console, navigate to Microservice Governance → Application List, select each application, and turn on the governance features.
# Nacos Server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nacos-server
name: nacos-server
spec:
replicas: 1
selector:
matchLabels:
app: nacos-server
template:
metadata:
labels:
app: nacos-server
spec:
containers:
- env:
- name: MODE
value: standalone
image: registry.cn-shanghai.aliyuncs.com/yizhan/nacos-server:latest
imagePullPolicy: Always
name: nacos-server
resources:
requests:
cpu: 250m
memory: 512Mi
dnsPolicy: ClusterFirst
restartPolicy: Always
---
# Nacos Server Service
---
apiVersion: v1
kind: Service
metadata:
name: nacos-server
spec:
ports:
- port: 8848
protocol: TCP
targetPort: 8848
selector:
app: nacos-server
type: ClusterIP
---
# Zuul Gateway
---
apiVersion: apps/v1
metadata:
name: spring-cloud-zuul
spec:
replicas: 1
selector:
matchLabels:
app: spring-cloud-zuul
template:
metadata:
annotations:
msePilotAutoEnable: "on"
msePilotCreateAppName: spring-cloud-zuul
labels:
app: spring-cloud-zuul
spec:
containers:
- env:
- name: JAVA_HOME
value: /usr/lib/jvm/java-1.8-openjdk/jre
- name: LANG
value: C.UTF-8
image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-zuul:1.0.1
imagePullPolicy: Always
name: spring-cloud-zuul
ports:
- containerPort: 20000
---
# Service A (base)
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: spring-cloud-a
labels:
app: spring-cloud-a
spec:
replicas: 2
selector:
matchLabels:
app: spring-cloud-a
template:
metadata:
annotations:
msePilotCreateAppName: spring-cloud-a
msePilotAutoEnable: "on"
labels:
app: spring-cloud-a
spec:
containers:
- env:
- name: LANG
value: C.UTF-8
- name: JAVA_HOME
value: /usr/lib/jvm/java-1.8-openjdk/jre
- name: profiler.micro.service.tag.trace.enable
value: "true"
image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-a:0.1-SNAPSHOT
imagePullPolicy: Always
name: spring-cloud-a
ports:
- containerPort: 20001
protocol: TCP
resources:
requests:
cpu: 250m
memory: 512Mi
livenessProbe:
tcpSocket:
port: 20001
initialDelaySeconds: 10
periodSeconds: 30
---
# Service A (gray)
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: spring-cloud-a-gray
labels:
app: spring-cloud-a-gray
spec:
replicas: 2
selector:
matchLabels:
app: spring-cloud-a-gray
template:
metadata:
annotations:
alicloud.service.tag: gray
msePilotCreateAppName: spring-cloud-a
msePilotAutoEnable: "on"
labels:
app: spring-cloud-a-gray
spec:
containers:
- env:
- name: LANG
value: C.UTF-8
- name: JAVA_HOME
value: /usr/lib/jvm/java-1.8-openjdk/jre
- name: profiler.micro.service.tag.trace.enable
value: "true"
image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-a:0.1-SNAPSHOT
imagePullPolicy: Always
name: spring-cloud-a-gray
ports:
- containerPort: 20001
protocol: TCP
resources:
requests:
cpu: 250m
memory: 512Mi
livenessProbe:
tcpSocket:
port: 20001
initialDelaySeconds: 10
periodSeconds: 30
---
# Service B (base) – offline disabled
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: spring-cloud-b
labels:
app: spring-cloud-b
spec:
replicas: 2
selector:
matchLabels:
app: spring-cloud-b
template:
metadata:
annotations:
msePilotCreateAppName: spring-cloud-b
msePilotAutoEnable: "on"
labels:
app: spring-cloud-b
spec:
containers:
- env:
- name: LANG
value: C.UTF-8
- name: JAVA_HOME
value: /usr/lib/jvm/java-1.8-openjdk/jre
- name: micro.service.shutdown.server.enable
value: "false"
- name: profiler.micro.service.http.server.enable
value: "false"
image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-b:0.1-SNAPSHOT
imagePullPolicy: Always
name: spring-cloud-b
ports:
- containerPort: 8080
protocol: TCP
resources:
requests:
cpu: 250m
memory: 512Mi
livenessProbe:
tcpSocket:
port: 20002
initialDelaySeconds: 10
periodSeconds: 30
---
# Service B (gray) – offline enabled via preStop hook
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: spring-cloud-b-gray
labels:
app: spring-cloud-b-gray
spec:
replicas: 2
selector:
matchLabels:
app: spring-cloud-b-gray
template:
metadata:
annotations:
alicloud.service.tag: gray
msePilotCreateAppName: spring-cloud-b
msePilotAutoEnable: "on"
labels:
app: spring-cloud-b-gray
spec:
containers:
- env:
- name: LANG
value: C.UTF-8
- name: JAVA_HOME
value: /usr/lib/jvm/java-1.8-openjdk/jre
image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-b:0.1-SNAPSHOT
imagePullPolicy: Always
name: spring-cloud-b-gray
ports:
- containerPort: 8080
protocol: TCP
resources:
requests:
cpu: 250m
memory: 512Mi
lifecycle:
preStop:
exec:
command:
- /bin/sh
- '-c'
- wget http://127.0.0.1:54199/offline 2>/tmp/null; sleep 30; exit 0
livenessProbe:
tcpSocket:
port: 20002
initialDelaySeconds: 10
periodSeconds: 30
---
# Service C (base) – warm‑up enabled (120 s)
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: spring-cloud-c
labels:
app: spring-cloud-c
spec:
replicas: 2
selector:
matchLabels:
app: spring-cloud-c
template:
metadata:
annotations:
msePilotCreateAppName: spring-cloud-c
msePilotAutoEnable: "on"
labels:
app: spring-cloud-c
spec:
containers:
- env:
- name: LANG
value: C.UTF-8
- name: JAVA_HOME
value: /usr/lib/jvm/java-1.8-openjdk/jre
image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-c:0.1-SNAPSHOT
imagePullPolicy: Always
name: spring-cloud-c
ports:
- containerPort: 8080
protocol: TCP
resources:
requests:
cpu: 250m
memory: 512Mi
livenessProbe:
tcpSocket:
port: 20003
initialDelaySeconds: 10
periodSeconds: 30
---
# CronHPA for service B (base)
---
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
name: spring-cloud-b
spec:
scaleTargetRef:
apiVersion: apps/v1beta2
kind: Deployment
name: spring-cloud-b
jobs:
- name: "scale-down"
schedule: "0 0/5 * * * *"
targetSize: 1
- name: "scale-up"
schedule: "10 0/5 * * * *"
targetSize: 2
---
# CronHPA for service B (gray)
---
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
name: spring-cloud-b-gray
spec:
scaleTargetRef:
apiVersion: apps/v1beta2
kind: Deployment
name: spring-cloud-b-gray
jobs:
- name: "scale-down"
schedule: "0 0/5 * * * *"
targetSize: 1
- name: "scale-up"
schedule: "10 0/5 * * * *"
targetSize: 2
---
# CronHPA for service C
---
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
name: spring-cloud-c
spec:
scaleTargetRef:
apiVersion: apps/v1beta2
kind: Deployment
name: spring-cloud-c
jobs:
- name: "scale-down"
schedule: "0 2/5 * * * *"
targetSize: 1
- name: "scale-up"
schedule: "10 2/5 * * * *"
targetSize: 2
---
# Zuul Service (SLB exposure)
---
apiVersion: v1
kind: Service
metadata:
name: zuul-slb
spec:
ports:
- port: 80
protocol: TCP
targetPort: 20000
selector:
app: spring-cloud-zuul
type: ClusterIP
---
# Service A base Service
---
apiVersion: v1
kind: Service
metadata:
name: spring-cloud-a-base
spec:
ports:
- name: http
port: 20001
protocol: TCP
targetPort: 20001
selector:
app: spring-cloud-a
---
# Service A gray Service
---
apiVersion: v1
kind: Service
metadata:
name: spring-cloud-a-gray
spec:
ports:
- name: http
port: 20001
protocol: TCP
targetPort: 20001
selector:
app: spring-cloud-a-gray
---
# Nacos SLB Service
---
apiVersion: v1
kind: Service
metadata:
name: nacos-slb
spec:
ports:
- port: 8848
protocol: TCP
targetPort: 8848
selector:
app: nacos-server
type: LoadBalancerResult Validation – No‑Loss Offline
During a simulated scaling event, the gray version of service B (offline feature enabled) recorded zero request errors, while the base version (offline feature disabled) produced 20 errors as pods were terminated, confirming that active notifications prevent traffic loss.
Result Validation – Service Warm‑up
When warm‑up was enabled for service C, traffic increased gradually after the instance restarted, and the system logged warm‑up start and end times. This behavior protected the application from resource exhaustion during the initial heavy‑load period.
References
Managed Kubernetes cluster documentation: https://help.aliyun.com/document_detail/95108.htm#task-skz-qwk-qfb
MSE governance enable guide: https://help.aliyun.com/document_detail/347625.htm#task-2140253
Zero‑Loss Offline live broadcast recording: https://yqh.aliyun.com/live/detail/27936
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
