Why Does Traffic Still Hit a Shut‑Down Instance After Clicking “Offline” in Nacos?
The article explains that Nacos‑based service deregistration is not instantly reflected in client load‑balancer caches due to periodic polling and unreliable UDP pushes, leading to a delay where gateways still route requests to a terminated instance, and then presents step‑by‑step lossless shutdown solutions for small teams and large‑scale deployments.
Why traffic does not stop immediately
Clients such as a Gateway use a load‑balancer (Ribbon or Spring Cloud LoadBalancer) that caches the list of service IPs. The cache is refreshed by a background task that periodically pulls the latest instance list from Nacos. Although Nacos can push updates via UDP, packet loss in complex production networks makes this unreliable. As a result, a delay of several tens of seconds occurs: Nacos marks the node offline, but the gateway’s local cache still contains the old address.
When the instance is killed (e.g., kill -15), the operating system returns an RST packet, causing the gateway to log Connection Refused or Read Timeout. During peak traffic this appears to the user as a “Network Exception”.
Small‑team workaround
Knowing the cause, many small teams add a sleep 40 step in their CI/CD pipeline after invoking Nacos’s OpenAPI to mark the instance offline. The process becomes:
Call Nacos OpenAPI to set the instance as offline.
Execute sleep 40 so the machine continues handling requests while all gateways refresh their caches.
After the wait, send kill -15 to terminate the process, achieving a graceful shutdown.
This method works but is too slow for large services (e.g., 500 instances would require hours) and cannot handle sudden crashes where the sleep cannot be executed.
Enterprise‑grade lossless shutdown
Step 1 – Orchestration‑layer deregistration
Bind the shutdown to the pod lifecycle in Kubernetes. When a pod is scaled down or updated, Kubernetes triggers a PreStop Hook before sending kill -15. The hook executes a curl command to Nacos to disable the instance and then sleeps 3–5 seconds to give the UDP push a chance.
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- |
# 1. Immediately send Nacos deregistration request
curl -X PUT "http://nacos-server:8848/nacos/v1/ns/instance?serviceName=my-service&ip=${POD_IP}&port=8080&enabled=false"
# 2. Brief pause for UDP push
sleep 5Step 2 – Client‑side retry
Even with PreStop, a few requests may still hit the stale cache. Because the connection refusal occurs during the TCP three‑way handshake, no business data is transmitted, making the request safe and idempotent. Enable the underlying retry mechanism in the client. For Spring Cloud, add:
spring:
cloud:
loadbalancer:
retry:
enabled: true # enable client retryThe load balancer catches the Connection Refused, discards the failed attempt, selects another healthy IP from the cache, and retries automatically, resulting in a seamless 200 OK response for the user.
Step 3 – Application graceful shutdown
Configure the Spring Boot application to wait for in‑flight requests before exiting:
server:
shutdown: graceful # enable graceful shutdown
spring:
lifecycle:
timeout-per-shutdown-phase: 30s # wait up to 30 seconds for old requestsCombined with Kubernetes’ pod termination, the Java process pauses for the configured timeout, finishes processing existing requests, and then releases resources.
Full lossless shutdown flow
The diagram illustrates the interaction of Kubernetes, Nacos, and client retry mechanisms throughout the shutdown process:
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer XiaoFu
xiaofucode.com – a programmer learning guide driven by the pursuit of profit
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
