Cloud Native 8 min read

Why Does Traffic Still Hit a Shut‑Down Instance After Marking It Offline in Nacos?

The article explains why a service instance marked offline in Nacos can still receive traffic due to client‑side cache delays and UDP push loss, and it presents step‑by‑step loss‑less shutdown solutions using Kubernetes PreStop hooks, client retries, and Spring Boot graceful shutdown.

Java Companion
Java Companion
Java Companion
Why Does Traffic Still Hit a Shut‑Down Instance After Marking It Offline in Nacos?

Why traffic doesn’t stop immediately

In a distributed system the service registry (Nacos) is not perfectly real‑time. Gateways such as Ribbon or Spring Cloud LoadBalancer keep a cached ServerList of service IPs to avoid a registry lookup on every request. The cache update has three weak points:

The client runs a background task that periodically pulls the latest instance list from Nacos.

Nacos can push changes via UDP, but UDP packets may be lost in complex production networks.

These factors cause a delay of dozens of seconds before the gateway’s cache reflects the offline status.

During this window the OS returns an RST packet, and the gateway logs Connection Refused or Read Timeout, resulting in a “network error” for the user.

Simple fix for small teams

A pragmatic approach adds a pause after marking the instance offline via Nacos OpenAPI. The workflow:

Call Nacos OpenAPI to set the instance as offline.

Sleep 40 seconds while the instance continues serving requests, allowing all gateways to refresh their caches.

After the pause, send kill -15 to terminate the process.

This method works but is slow for large services and cannot handle sudden crashes.

Loss‑less shutdown used by large companies

Step 1: Orchestration‑level traffic draining

Bind the shutdown to the Kubernetes pod lifecycle with a PreStop Hook that deregisters the instance from Nacos and then pauses briefly for the UDP push.

lifecycle:
  preStop:
    exec:
      command:
        - /bin/sh
        - -c
        - |
          # 1. Immediately deregister from Nacos
          curl -X PUT "http://nacos-server:8848/nacos/v1/ns/instance?serviceName=my-service&ip=${POD_IP}&port=8080&enabled=false"
          # 2. Short pause (3~5 s) for UDP push
          sleep 5

Step 2: Client‑side retry

Enable Spring Cloud LoadBalancer retry so that a Connection Refused occurring during the TCP three‑way handshake is caught and the request is automatically retried on another healthy instance. The failed attempt is safe and idempotent.

spring:
  cloud:
    loadbalancer:
      retry:
        enabled: true

Step 3: Application graceful shutdown

Configure Spring Boot to perform a graceful shutdown, allowing in‑flight requests to finish before the JVM exits.

server:
  shutdown: graceful
spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

Full loss‑less shutdown flow

Combining the Kubernetes PreStop hook, Nacos deregistration, client‑side retry, and Spring Boot graceful shutdown creates a seamless process: any request that reaches a still‑cached IP fails fast during the handshake, is retried, and the user receives a normal 200 OK without noticing the instance termination.

Summary

Precise control of registration state, cache refresh timing, and graceful termination is essential for loss‑less service shutdown. Integrating Kubernetes lifecycle hooks, Nacos deregistration, client retries, and Spring Boot graceful shutdown eliminates user‑visible errors during instance removal.

KubernetesNacosSpring CloudLoad BalancerGraceful Shutdownservice registry
Java Companion
Written by

Java Companion

A highly professional Java public account

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.