Zero‑Downtime Deployments with Alibaba Cloud Lightweight Message Queue
This article explains how Alibaba Cloud Lightweight Message Queue (formerly MNS) enables lossless, zero‑downtime service releases by redesigning the network entry layer, using load‑balancer draining, injecting HTTP close frames, and providing CI/CD scripts that work across ECS and Kubernetes environments.
Alibaba Cloud Lightweight Message Queue (formerly MNS) is a high‑concurrency, elastic message‑queue service used in retail, finance, automotive, gaming and AI scenarios. The article examines its "lossless release" capability from a developer’s perspective, detailing the technical advantages, architecture, implementation steps, and practical validation.
1. Core Advantages and Business Value
Million‑TPS, zero‑perceived errors : Unlike many lossless solutions that still cause brief traffic interruptions, this design has been proven in production to avoid any client‑side errors during release.
Compatibility with existing users : No client upgrades are required, eliminating a major migration barrier.
High robustness, low maintenance : The solution is simple, robust, and requires no architectural changes.
Strong universality : Works with any stateless HTTP‑based application.
2. Architecture Overview
The lossless release focuses on the network entry layer. For stateless services, only the TCP connections to the instance being upgraded need to be gracefully removed. The simplified model includes:
Focus on the network entry (load balancer → backend).
Maintain a generic architecture similar to typical HTTP services.
Ensure compatibility with various deployment forms (ACK, ECS, different LB versions).
Decouple from the application so no client changes are needed.
3. Core Implementation Process
The implementation consists of two main phases: removing connections from the instance to be released, then publishing the new version.
Phase 1 – Remove Connections
Step 1: Remove TCP connection requests while ensuring existing connections continue to be forwarded and the application remains responsive.
Step 2: Gracefully close residual connections.
Phase 2 – Publish Application
Step 3: Verify that no connections or pending requests remain, then perform the release.
Step 4: Re‑introduce traffic to the newly released instance.
Key technical challenges addressed:
How to stop new TCP connections without breaking existing ones.
Ensuring graceful shutdown of residual connections.
Compatibility with Kubernetes (ACK) where kube‑proxy adds an extra network layer.
4. Graceful Connection Termination
Because TCP can only be closed by the client, the solution injects an HTTP Connection: close header (or an HTTP close frame) in the server’s response. For idle connections, the load balancer is drained and the socket timeout is waited for, allowing the client to discard the connection automatically.
pubstart)
offline
stop_http
stopjava
startjava
start_http
online
;;
offline_http() {
echo "[ 1/10] -- offline http from load balance server"
# Delete status flag to trigger LB health‑check failure and enable Nginx close‑frame response
rm -f $STATUSROOT_HOME/status
curl localhost:7001/shutDownGracefully
sleep $SOCKET_TIMEOUT + $HEALTH_CHECK
}5. CI/CD Integration
For ECS, the release script modifies the offline and online phases, removing the status flag, waiting for the socket timeout, and then restoring the flag. For Kubernetes (ACK), the same script is placed in a preStop hook of the pod definition:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
spec:
replicas: 1
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: main-container
image: my-image:latest
lifecycle:
preStop:
exec:
command:
- sh
- /home/admin/offline.sh
ports:
- containerPort: 80806. Validation
Testing in a simulated environment shows that after applying the lossless release, error rates during deployment drop to zero even under million‑TPS load, confirming the effectiveness of the approach.
7. Conclusion
The lossless release technique for Alibaba Cloud Lightweight Message Queue combines load‑balancer draining, Nginx‑level HTTP close‑frame injection, and timeout handling to achieve true zero‑downtime deployments. It works across ECS and Kubernetes, scales to million‑TPS workloads, and requires no client modifications, embodying the product’s “customer‑first” philosophy.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
