Cloud Native 7 min read

Ensuring Zero‑Downtime Rolling Updates in Kubernetes: Causes and Solutions

This article analyzes why Kubernetes rolling updates can still cause service interruptions during pod startup and termination, explains the underlying mechanisms of Kubelet and Endpoint Controller, and provides practical steps such as readiness probes and preStop hooks to achieve smoother, near‑zero‑downtime deployments.

360 Quality & Efficiency

Apr 14, 2023

Ensuring Zero‑Downtime Rolling Updates in Kubernetes: Causes and Solutions

Kubernetes has become the standard platform for rapid application deployment and scaling, but rolling updates may still lead to service interruptions. The article first describes the problem: despite ensuring at least one pod is ready, users still experience connection refusals during updates.

Problem Causes

1. Issues during pod startup – If a readiness probe is not defined, a pod is considered ready immediately and may receive traffic before the application is fully initialized, causing connection‑refused errors.

2. Issues during pod termination – When a pod receives a termination signal, Kubelet handles container shutdown while the Endpoint Controller updates iptables to remove the pod from the service. Because these actions run in parallel, the iptables update can lag behind, allowing traffic to be routed to a pod that has already stopped, resulting in failed connections.

The article highlights that Kubelet and Endpoint Controller operate concurrently, and the longer iptables update chain can cause the race condition where the application has exited but the service routing still points to it.

Solution Steps

1. Prevent connection refusals on pod startup – Configure a proper readiness probe (e.g., an HTTP GET to a simple endpoint) so the pod is marked ready only after the application can handle requests.

2. Prevent connection interruptions on pod termination – Use a preStop hook to delay termination for a few seconds, allowing the Endpoint Controller to finish updating iptables before the container stops. The hook can be defined as:

lifecycle:
  preStop:
    exec:
      command:
        - sh
        - -c
        - "sleep 5"

Even a short 5‑10 second delay significantly improves deployment stability.

Graceful shutdown steps

1. Wait a few seconds, then stop accepting new traffic.

2. Wait for all in‑flight requests to complete.

3. Finally terminate the process.

Note that Kubernetes defaults to a 30‑second terminationGracePeriodSeconds, but this may need adjustment under high load.

Summary

Kubernetes provides strong support for automated rolling updates, but achieving true zero‑downtime in production requires understanding the pod lifecycle, the behavior of Kubelet and Endpoint Controller, and applying readiness probes and graceful termination hooks to ensure stable service deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Zero Downtime rolling-update preStop Hook Readiness Probe

Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.