Operations 13 min read

Graceful Shutdown for Kubernetes‑Based Spring Boot Microservices with Nacos

This article explains the concept of graceful shutdown, outlines the essential steps, and demonstrates a practical Kubernetes‑Spring Boot‑Nacos case study, including PreStopHook handling, terminationGracePeriodSeconds tuning, and further optimizations for message queues, scheduled tasks, and traffic control.

Selected Java Interview Questions

Dec 14, 2023

Graceful Shutdown for Kubernetes‑Based Spring Boot Microservices with Nacos

1 Concept

Graceful shutdown (also called graceful exit or lossless shutdown) refers to the process of safely stopping a device, system, or application by executing a series of actions that protect data, prevent errors, and maintain overall stability.

Typical steps include:

Backup data : Immediately persist any in‑memory modifications or caches to a database or disk.

Stop receiving new requests .

Handle in‑flight requests .

Notify dependent components .

Wait for all elements to exit safely, then shut down the system .

The exact procedure varies across devices, systems, and applications, and may need to be adapted for specific scenarios such as user notifications or automatic state saving.

2 Case Study

With the rise of microservices, operational methods have shifted from Docker to Kubernetes, making graceful shutdown more complex. The following case illustrates the problems and solutions.

Case Before: k8s Shutdown Process

When a developer runs kubectl delete pod, two processes start:

Network rules take effect

Kube‑apiserver receives the pod deletion request and updates the pod status to Terminating in etcd.

The endpoint controller removes the pod IP from the endpoint object.

Kube‑proxy updates iptables rules based on the endpoint change, stopping traffic to the pod.

Container deletion

Kube‑apiserver receives the pod deletion request and marks the pod as Terminating in etcd.

Kubelet cleans up container‑related resources such as storage and network.

A PreStop hook is added to wait until traffic no longer reaches the pod.

Kubelet sends SIGTERM to the container.

If the container does not exit within the default 30 seconds, Kubelet sends SIGKILL to force termination.

k8s + Spring Boot + Nacos Case

The PreStopHook performs two actions:

Nacos deregistration.

Sleep for 35 seconds.

The Spring Boot application is terminated via signals, and the Kubernetes terminationGracePeriodSeconds is set to 35 seconds.

Problem

Spring Boot shutdown time is only 2 seconds, so the program cannot finish pending threads, asynchronous messages, or scheduled tasks. Although the grace period is 35 seconds, the PreStop hook sleeps for 35 seconds plus the request time, exceeding the grace period; Kubelet then grants an additional 2‑second window before issuing SIGKILL.

Why is a 35‑second sleep needed after deregistration? Nacos service discovery latency (real‑time via UDP, but HTTP max wait 10 seconds) and Ribbon’s default cache refresh interval (30 seconds) together require a longer pause to ensure other services see the deregistration.

Is Nacos service change notification truly real‑time? Not always; UDP is real‑time, but HTTP polling (the default in many environments) introduces up to 10 seconds delay.

Case Optimizations

Potential improvements:

Reduce the 35‑second sleep after Nacos deregistration.

Determine a reasonable value for terminationGracePeriodSeconds.

Optimization 1

The 35‑second pause accounts for Nacos discovery time plus Ribbon cache refresh (≈40 seconds in worst case). To shorten it:

Enable UDP for Nacos (requires coordination with operations).

Listen to Nacos change notifications and refresh Ribbon cache immediately when the service goes offline.

<code style="padding: 16px; color: #ddd; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px"><span style="color: #75715e; line-height: 26px">/**
 * Subscribe to Nacos instance change notifications
 * Manually refresh Ribbon service instance cache
 * Nacos client 1.4.6 (1.4.1 has a critical bug)
 */</span>
<span style="color: #75715e; line-height: 26px">@Component</span>
<span style="color: #75715e; line-height: 26px">@Slf4j</span>
public class NacosInstancesChangeEventListener extends Subscriber<InstancesChangeEvent> {
    @Resource
    private SpringClientFactory springClientFactory;

    @PostConstruct
    public void registerToNotifyCenter() {
        NotifyCenter.registerSubscriber(this);
    }

    @Override
    public void onEvent(InstancesChangeEvent event) {
        String service = event.getServiceName();
        // service: DEFAULT_GROUP@@demo   ribbonService: demo
        String ribbonService = service.substring(service.indexOf("@@") + 2);
        log.info("##### Received Nacos instance change event:{} ribbonServiceName: {}", event.getServiceName(), ribbonService);
        ILoadBalancer loadBalancer = springClientFactory.getLoadBalancer(ribbonService);
        if (loadBalancer != null) {
            ((ZoneAwareLoadBalancer<?>) loadBalancer).updateListOfServers();
            log.info("Refresh ribbon service instance cache: {} success", ribbonService);
        }
    }

    @Override
    public Class<? extends com.alibaba.nacos.common.notify.Event> subscribeType() {
        return InstancesChangeEvent.class;
    }

    /**
     * Nacos 1.4.4~1.4.6 requires this method implementation; versions after 2.1.2 fix the issue.
     * When multiple registries exist, change events are not isolated, so we need to decide whether to handle the event.
     */
    @Override
    public boolean scopeMatches(InstancesChangeEvent event) {
        return true;
    }
}
</code>

Optimization 2

The value of terminationGracePeriodSeconds should be slightly larger than the total time spent in PreStop plus the Spring Boot shutdown duration, which depends on business logic (MQ messages, scheduled tasks, thread‑pool work, data backup). The common recommendation is to enable Spring Boot’s graceful shutdown and add custom shutdown logic.

Spring Boot’s default graceful shutdown buffer is 30 seconds, so a practical setting for terminationGracePeriodSeconds is 10 seconds + 30 seconds = 40 seconds.

After Optimization

Using Actuator Shutdown

Some articles suggest using Spring Boot’s actuator shutdown endpoint for graceful termination. The flow is:

In reality, after invoking shutdown, Spring Boot enters its graceful shutdown process, but if the process does not finish before the kill -15 signal, the container is killed. If thread pools are not configured to wait, tasks may be terminated abruptly.

<code style="padding: 16px; color: #ddd; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menlo, monospace; font-size: 12px">// Without these settings, tasks may be killed when SIGTERM is received
threadPoolTaskExecutor.setWaitForTasksToCompleteOnShutdown(true);
threadPoolTaskExecutor.setAwaitTerminationSeconds(30);
</code>

3 Further Optimizations

MQ and Scheduled Tasks

When Nacos deregisters, other services listen for the event and refresh Ribbon caches. The shutting‑down service itself can also listen for its own deregistration event, stop MQ listeners, and suspend scheduled tasks, achieving a cleaner shutdown.

Traffic Control

If the environment does not use Kubernetes for pod traffic control, a Spring Cloud Gateway may act as the gateway. The gateway should also listen to Nacos deregistration events to refresh Ribbon caches and stop traffic to the shutting‑down service.

4 Summary

After extensive research and practice, the author presents a comprehensive graceful shutdown solution. The biggest challenges lie in the business logic of the service itself, such as long‑running requests, scheduled jobs, thread‑pool tasks, MQ messages, data persistence, and ensuring idempotent APIs.

Identify business logic that exceeds the default 30‑second shutdown window.

Implement custom shutdown hooks to save unfinished work and data.

Ensure API operations are idempotent.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Nacos Spring Boot Graceful Shutdown preStopHook terminationGracePeriodSeconds

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.