How to Achieve Zero‑Downtime Service Deployment with Spring Cloud and Kubernetes
This article examines why most incidents occur during application rollout, analyzes the Kubernetes pod lifecycle for both startup and shutdown, identifies common zero‑downtime challenges, and presents concrete strategies—including active notifications, adaptive waiting, delayed registration, and readiness probes—to ensure lossless service upgrades and rollbacks.
Background
Most production incidents happen during the release phase of an application, where code‑related issues can affect users. To minimize impact, deployments must avoid any loss of service caused by the application itself.
Container Application Lifecycle Analysis
1. Application Startup (Up)
From the Eureka registry perspective, a newly started service registers itself with the Eureka server and sends heartbeats to stay alive.
Consumers poll Eureka every 30 seconds to obtain the latest service list and then call the new provider.
From the Pod perspective, the flow is:
User sends a create/apply request to the Kubernetes API server.
API server validates the manifest and stores it in etcd.
Controller‑manager creates a Pending Pod object.
Scheduler assigns the Pod to a suitable Node.
Kubelet on the Node pulls the image, starts the container, and runs readiness probes.
Kubelet reports Ready status back to the API server, which updates etcd.
Endpoint Controller updates the Service’s endpoint list so traffic can reach the new Pod.
2. Application Shutdown (Down)
From Eureka, the consumer continues to call the provider until the provider deregisters.
From the Pod side, the user sends a delete request to the API server, which marks the Pod as Terminating with a 30‑second grace period.
API server notifies listeners (kubelet, Endpoint Controller). The Endpoint Controller promptly removes the Pod from the Service endpoints.
Kubelet executes any preStop hook, then sends SIGTERM to containers; if they do not exit within the grace period, SIGKILL is issued.
After termination, the Pod is fully removed from etcd.
Current Problems
Consumer cannot detect provider shutdown promptly – Eureka polling interval (30 s) may cause calls to a downed instance.
Slow initialization – High traffic during HPA scaling leads to request timeouts and pod restarts.
Premature registration – Services register before they have fully loaded resources, causing slow responses or errors.
Release‑runtime mismatch – Kubernetes readiness checks may mark a pod ready before it is registered, leading to traffic being routed to an unregistered instance.
Solution Overview
1. Zero‑Downtime Shutdown
Active Notification : Before a service instance deregisters, it marks responses with a special flag. Consumers that see the flag immediately re‑query the registry to learn the new status.
Adaptive Waiting : The instance tracks in‑flight requests; only after all are completed does it proceed with deregistration. This ensures no request is lost during high‑concurrency shutdowns.
Implementation example (using Nacos as the registry and a Spring shutdown hook):
terminationGracePeriodSeconds: 60
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 9000
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 2
successThreshold: 2
timeoutSeconds: 2
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- "sleep 15 && /usr/bin/killall java"2. Zero‑Downtime Startup
Delayed Registration : Defer service registration until the application has completed all asynchronous initialization.
Small‑Traffic Warm‑up : Start new pods with a low load‑balancer weight and gradually increase it as the pod stabilizes, preventing a cold start from overwhelming the system.
Readiness Probe Alignment : Configure /healthz as the readiness probe; only when it returns 200 does the pod get added to the Service endpoints, ensuring the registry and runtime states are synchronized.
Summary Table
Zero‑Downtime Shutdown : Active notification + adaptive waiting (Nacos + shutdown hook + preStop).
Zero‑Downtime Startup : Delayed registration, small‑traffic warm‑up, readiness probe alignment.
Conclusion
By combining proactive notifications, graceful shutdown hooks, delayed service registration, traffic‑aware warm‑up, and proper readiness probes, teams can achieve lossless application upgrades and rollbacks in Kubernetes‑based microservice environments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
