Mastering Graceful Shutdown in Kubernetes: Real-World Spring Boot & Nacos Cases
This article explains the concept of graceful shutdown, walks through detailed Kubernetes pod termination steps, presents real-world Spring Boot and Nacos integration cases, analyzes common pitfalls such as premature termination and message loss, and offers practical optimization strategies for handling MQ, scheduled tasks, and traffic control.
1 Concept
Graceful shutdown (also called graceful offline or lossless offline) refers to a set of steps performed before a device, system, or application stops, ensuring data safety, preventing errors, and maintaining overall stability.
The typical steps include:
Backup data : Immediately persist any in‑memory modifications or caches to a database or disk.
Stop receiving new requests .
Process unfinished requests .
Notify dependent components .
Wait for all elements to exit safely, then shut down the system .
Specific implementations vary across devices, systems, and scenarios; sometimes the system must inform users of impending shutdown or automatically save state for the next start.
2 Case Studies
Pre‑shutdown: Kubernetes pod termination flow
When kubectl delete pod is executed, two processes start:
Network rules take effect
Kube‑apiserver receives the delete request and marks the pod as Terminating in etcd.
Endpoint controller removes the pod IP from the endpoint object.
Kube‑proxy updates iptables rules based on the endpoint change, stopping traffic to the pod.
Container deletion
Kube‑apiserver marks the pod as Terminating in etcd.
Kubelet cleans up container resources such as storage and network.
A PreStop hook is added to wait until traffic stops reaching the pod.
Kubelet sends SIGTERM to the container.
If the container does not exit within the default 30 seconds, Kubelet sends SIGKILL to force termination.
Kubernetes + Spring Boot + Nacos case
In this scenario the PreStop hook performs two actions:
Nacos deregistration.
Sleep for 35 seconds.
The Spring Boot application is shut down via a semaphore, and the pod’s terminationGracePeriodSeconds is set to 35 seconds.
Problems
Spring Boot shutdown time is only 2 seconds, so it cannot finish pending threads, async messages, or scheduled tasks. Because the grace period is 35 seconds, the PreStop sleep plus request time exceeds it, causing Kubelet to grant an additional 2 seconds before issuing kill -9.
Why is a 35‑second sleep needed after deregistration? Nacos service discovery latency is real‑time, but Ribbon’s default cache refresh interval is 30 seconds; the extra 5 seconds address occasional Feign request failures.
Is Nacos discovery truly real‑time? It uses HTTP (max 10 seconds) and UDP (real‑time). In production UDP may be blocked, so discovery often relies on HTTP polling.
Case Optimizations
Reduce the 35‑second sleep after Nacos deregistration.
Adjust terminationGracePeriodSeconds to a reasonable value.
Optimization 1
The sleep time should reflect service discovery latency plus Ribbon cache refresh (≈40 seconds) to guarantee no new Feign calls during shutdown. To shorten it:
Enable UDP (requires coordination with operations).
Listen to Nacos change notifications and refresh Ribbon cache promptly.
@Component
@Slf4j
public class NacosInstancesChangeEventListener extends Subscriber<InstancesChangeEvent> {
@Resource
private SpringClientFactory springClientFactory;
@PostConstruct
public void registerToNotifyCenter(){
NotifyCenter.registerSubscriber(this);
}
@Override
public void onEvent(InstancesChangeEvent event) {
String service = event.getServiceName();
String ribbonService = service.substring(service.indexOf("@@") + 2);
log.info("#### Received Nacos instance change event:{} ribbonServiceName: {}", event.getServiceName(), ribbonService);
ILoadBalancer loadBalancer = springClientFactory.getLoadBalancer(ribbonService);
if (loadBalancer != null) {
((ZoneAwareLoadBalancer<?>) loadBalancer).updateListOfServers();
log.info("Refresh ribbon service instance cache: {} success", ribbonService);
}
}
@Override
public Class<? extends com.alibaba.nacos.common.notify.Event> subscribeType() {
return InstancesChangeEvent.class;
}
@Override
public boolean scopeMatches(InstancesChangeEvent event) {
return true;
}
}Optimization 2
The value of terminationGracePeriodSeconds should be slightly larger than the total time spent in PreStop plus the Spring Boot shutdown duration, which depends on business logic (MQ messages, scheduled tasks, thread‑pool jobs, data backup). Spring Boot’s default graceful shutdown buffer is 30 seconds; a practical setting is 10 + 30 seconds.
Actuator shutdown approach
Some guides recommend using actuator shutdown. After invoking it, Spring Boot enters its graceful shutdown flow, but if the process is interrupted by kill -15 before thread pools finish, the service will still terminate.
// Without proper configuration, threads are killed on SIGTERM
threadPoolTaskExecutor.setWaitForTasksToCompleteOnShutdown(true);
threadPoolTaskExecutor.setAwaitTerminationSeconds(30);3 Further Optimizations
MQ and scheduled tasks
When the service receives a Nacos deregistration event, it can also listen to that event itself, stop MQ listeners and scheduled tasks, achieving a cleaner shutdown.
Traffic control
If the pod is not behind Kubernetes traffic control, a Spring Cloud Gateway should also listen to Nacos deregistration events to refresh Ribbon cache and stop traffic to the shutting‑down service.
4 Summary
After extensive research and testing, the author proposes a complete graceful shutdown solution, acknowledging possible non‑professional wording. The biggest challenges lie in business logic: long‑running requests, unfinished tasks, data persistence, and idempotent interfaces.
Identify business logic that exceeds the grace period (e.g., >30 seconds).
Implement custom shutdown logic to save unfinished tasks and data.
Ensure API operations are idempotent.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
