Cloud Native 12 min read

Graceful Shutdown in Kubernetes: Concepts, Case Studies, and Optimizations

This article explains the concept of graceful shutdown, outlines the standard steps, and presents detailed Kubernetes, Spring Boot, and Nacos case studies, followed by optimization techniques, code examples, and practical recommendations for handling MQ, scheduled tasks, and traffic control during service termination.

Architect

Mar 7, 2024

Graceful Shutdown in Kubernetes: Concepts, Case Studies, and Optimizations

1. Concept

Graceful shutdown refers to the process of stopping a system, service, or application in a controlled manner to ensure data safety, prevent errors, and maintain overall stability.

Typical steps include:

Backup data : Persist any in‑memory modifications or caches to the database or disk.

Stop receiving new requests .

Process unfinished requests .

Notify dependent components .

Wait for all elements to exit safely, then shut down the system .

2. Case Studies

2.1 Kubernetes shutdown process

When kubectl delete pod is executed, two parallel processes start:

Network rule update : kube‑apiserver marks the pod as Terminating in etcd, the endpoint controller removes the pod IP, and kube‑proxy updates iptables so traffic no longer routes to the pod.

Container deletion : kube‑apiserver marks the pod as Terminating , kubelet cleans up storage and network resources, a PreStop hook is invoked, kubelet sends SIGTERM to the container, and if the container does not exit within the default 30 s, kubelet sends SIGKILL.

2.2 k8s + Spring Boot + Nacos case

The PreStop hook performs two actions: Nacos deregistration and a 35‑second sleep. The pod’s terminationGracePeriodSeconds is also set to 35 s.

Problem

The Spring Boot application shuts down in about 2 s, which is insufficient to finish pending thread tasks, asynchronous messages, or scheduled jobs. Because the terminationGracePeriodSeconds is 35 s, the PreStop sleep plus request time exceeds the grace period, causing kubelet to grant an additional 2 s before issuing SIGKILL.

Why is a 35 s sleep needed after Nacos deregistration? Nacos service‑change propagation via HTTP can take up to 10 s, and Ribbon’s default cache refresh interval is 30 s, so 35 s was chosen to cover both.

Code example – Nacos instance change listener

/**
 * Subscribe to Nacos instance change notifications
 * Manually refresh Ribbon service instance cache
 * Nacos client 1.4.6 (1.4.1 has a critical bug)
 */
@Component
@Slf4j
public class NacosInstancesChangeEventListener extends Subscriber<InstancesChangeEvent> {

    @Resource
    private SpringClientFactory springClientFactory;

    @PostConstruct
    public void registerToNotifyCenter(){
        NotifyCenter.registerSubscriber(this);
    }
    @Override
    public void onEvent(InstancesChangeEvent event) {
        String service = event.getServiceName();
        // service: DEFAULT_GROUP@@demo   ribbonService: demo
        String ribbonService = service.substring(service.indexOf("@@") + 2);
        log.info("#### Received Nacos instance change event:{} ribbonServiceName: {}", event.getServiceName(), ribbonService);
        ILoadBalancer loadBalancer = springClientFactory.getLoadBalancer(ribbonService);
        if(loadBalancer != null){
            ((ZoneAwareLoadBalancer<?>) loadBalancer).updateListOfServers();
            log.info("Refresh ribbon service instance cache: {} success", ribbonService);
        }
    }

    @Override
    public Class<? extends com.alibaba.nacos.common.notify.Event> subscribeType() {
        return InstancesChangeEvent.class;
    }

    /**
     * Nacos 1.4.4~1.4.6 requires this method; versions >=2.1.2 fixed it.
     * When multiple registries exist, change events are not isolated, so we need to decide whether to handle the event.
     */
    @Override
    public boolean scopeMatches(InstancesChangeEvent event) {
        return true;
    }
}

2.3 Optimization points

Reduce the 35 s sleep after Nacos deregistration if possible.

Determine a reasonable value for terminationGracePeriodSeconds based on PreStop duration and Spring Boot shutdown time.

Optimization 1

The 35 s sleep accounts for Nacos service discovery time plus Ribbon cache refresh (≈40 s in worst case). To shorten it:

Enable UDP for Nacos (requires coordination with operations).

Listen to Nacos change notifications and refresh Ribbon cache immediately when a service goes offline.

Optimization 2 – Adjust terminationGracePeriodSeconds

The value should be slightly larger than the total time spent in PreStop plus Spring Boot shutdown (which depends on business logic such as MQ messages, scheduled tasks, and thread‑pool tasks). Spring Boot’s default graceful shutdown buffer is 30 s, so a practical setting is 10 + 30 = 40 s.

Thread‑pool configuration example

// Without these settings, when kill -15 occurs, unfinished thread‑pool tasks are forced to close
threadPoolTaskExecutor.setWaitForTasksToCompleteOnShutdown(true);
threadPoolTaskExecutor.setAwaitTerminationSeconds(30);

3. Further Optimizations

MQ and scheduled tasks

When a service deregisters from Nacos, it can also listen to its own deregistration event and stop consuming MQ messages and scheduled jobs, achieving a cleaner shutdown.

Traffic control

If a gateway (e.g., Spring Cloud Gateway) is used instead of k8s traffic control, the gateway should also listen to Nacos deregistration events to refresh its Ribbon cache and stop routing traffic to the shutting‑down service.

4. Conclusion

The article presents a comprehensive graceful shutdown solution for microservices running on Kubernetes, covering basic concepts, detailed case studies, and practical optimizations such as handling MQ, scheduled tasks, and traffic control. Success depends on both the mechanical shutdown steps and the business‑specific logic that must be addressed during service termination.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Kubernetes Nacos Spring Boot Graceful Shutdown

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.