Graceful Shutdown for Kubernetes‑Based Spring Boot Microservices with Nacos
This article explains the concept of graceful shutdown, outlines the essential steps, and demonstrates a practical Kubernetes‑Spring Boot‑Nacos case study, including PreStopHook handling, terminationGracePeriodSeconds tuning, and further optimizations for message queues, scheduled tasks, and traffic control.
1 Concept
Graceful shutdown (also called graceful exit or lossless shutdown) refers to the process of safely stopping a device, system, or application by executing a series of actions that protect data, prevent errors, and maintain overall stability.
Typical steps include:
Backup data : Immediately persist any in‑memory modifications or caches to a database or disk.
Stop receiving new requests .
Handle in‑flight requests .
Notify dependent components .
Wait for all elements to exit safely, then shut down the system .
The exact procedure varies across devices, systems, and applications, and may need to be adapted for specific scenarios such as user notifications or automatic state saving.
2 Case Study
With the rise of microservices, operational methods have shifted from Docker to Kubernetes, making graceful shutdown more complex. The following case illustrates the problems and solutions.
Case Before: k8s Shutdown Process
When a developer runs kubectl delete pod , two processes start:
Network rules take effect
Kube‑apiserver receives the pod deletion request and updates the pod status to Terminating in etcd.
The endpoint controller removes the pod IP from the endpoint object.
Kube‑proxy updates iptables rules based on the endpoint change, stopping traffic to the pod.
Container deletion
Kube‑apiserver receives the pod deletion request and marks the pod as Terminating in etcd.
Kubelet cleans up container‑related resources such as storage and network.
A PreStop hook is added to wait until traffic no longer reaches the pod.
Kubelet sends SIGTERM to the container.
If the container does not exit within the default 30 seconds, Kubelet sends SIGKILL to force termination.
k8s + Spring Boot + Nacos Case
The PreStopHook performs two actions:
Nacos deregistration.
Sleep for 35 seconds.
The Spring Boot application is terminated via signals, and the Kubernetes terminationGracePeriodSeconds is set to 35 seconds.
Problem
Spring Boot shutdown time is only 2 seconds, so the program cannot finish pending threads, asynchronous messages, or scheduled tasks. Although the grace period is 35 seconds, the PreStop hook sleeps for 35 seconds plus the request time, exceeding the grace period; Kubelet then grants an additional 2‑second window before issuing SIGKILL .
Why is a 35‑second sleep needed after deregistration? Nacos service discovery latency (real‑time via UDP, but HTTP max wait 10 seconds) and Ribbon’s default cache refresh interval (30 seconds) together require a longer pause to ensure other services see the deregistration.
Is Nacos service change notification truly real‑time? Not always; UDP is real‑time, but HTTP polling (the default in many environments) introduces up to 10 seconds delay.
Case Optimizations
Potential improvements:
Reduce the 35‑second sleep after Nacos deregistration.
Determine a reasonable value for terminationGracePeriodSeconds .
Optimization 1
The 35‑second pause accounts for Nacos discovery time plus Ribbon cache refresh (≈40 seconds in worst case). To shorten it:
Enable UDP for Nacos (requires coordination with operations).
Listen to Nacos change notifications and refresh Ribbon cache immediately when the service goes offline.
/**
* Subscribe to Nacos instance change notifications
* Manually refresh Ribbon service instance cache
* Nacos client 1.4.6 (1.4.1 has a critical bug)
*/
@Component
@Slf4j
public class NacosInstancesChangeEventListener extends Subscriber
{
@Resource
private SpringClientFactory springClientFactory;
@PostConstruct
public void registerToNotifyCenter() {
NotifyCenter.registerSubscriber(this);
}
@Override
public void onEvent(InstancesChangeEvent event) {
String service = event.getServiceName();
// service: DEFAULT_GROUP@@demo ribbonService: demo
String ribbonService = service.substring(service.indexOf("@@") + 2);
log.info("##### Received Nacos instance change event:{} ribbonServiceName: {}", event.getServiceName(), ribbonService);
ILoadBalancer loadBalancer = springClientFactory.getLoadBalancer(ribbonService);
if (loadBalancer != null) {
((ZoneAwareLoadBalancer
) loadBalancer).updateListOfServers();
log.info("Refresh ribbon service instance cache: {} success", ribbonService);
}
}
@Override
public Class
subscribeType() {
return InstancesChangeEvent.class;
}
/**
* Nacos 1.4.4~1.4.6 requires this method implementation; versions after 2.1.2 fix the issue.
* When multiple registries exist, change events are not isolated, so we need to decide whether to handle the event.
*/
@Override
public boolean scopeMatches(InstancesChangeEvent event) {
return true;
}
}Optimization 2
The value of terminationGracePeriodSeconds should be slightly larger than the total time spent in PreStop plus the Spring Boot shutdown duration, which depends on business logic (MQ messages, scheduled tasks, thread‑pool work, data backup). The common recommendation is to enable Spring Boot’s graceful shutdown and add custom shutdown logic.
Spring Boot’s default graceful shutdown buffer is 30 seconds, so a practical setting for terminationGracePeriodSeconds is 10 seconds + 30 seconds = 40 seconds.
After Optimization
Using Actuator Shutdown
Some articles suggest using Spring Boot’s actuator shutdown endpoint for graceful termination. The flow is:
In reality, after invoking shutdown, Spring Boot enters its graceful shutdown process, but if the process does not finish before the kill -15 signal, the container is killed. If thread pools are not configured to wait, tasks may be terminated abruptly.
// Without these settings, tasks may be killed when SIGTERM is received
threadPoolTaskExecutor.setWaitForTasksToCompleteOnShutdown(true);
threadPoolTaskExecutor.setAwaitTerminationSeconds(30);3 Further Optimizations
MQ and Scheduled Tasks
When Nacos deregisters, other services listen for the event and refresh Ribbon caches. The shutting‑down service itself can also listen for its own deregistration event, stop MQ listeners, and suspend scheduled tasks, achieving a cleaner shutdown.
Traffic Control
If the environment does not use Kubernetes for pod traffic control, a Spring Cloud Gateway may act as the gateway. The gateway should also listen to Nacos deregistration events to refresh Ribbon caches and stop traffic to the shutting‑down service.
4 Summary
After extensive research and practice, the author presents a comprehensive graceful shutdown solution. The biggest challenges lie in the business logic of the service itself, such as long‑running requests, scheduled jobs, thread‑pool tasks, MQ messages, data persistence, and ensuring idempotent APIs.
Identify business logic that exceeds the default 30‑second shutdown window.
Implement custom shutdown hooks to save unfinished work and data.
Ensure API operations are idempotent.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.