Cloud Native 7 min read

Ensuring Zero‑Downtime Rolling Updates in Kubernetes: Causes and Solutions

This article analyzes why Kubernetes rolling updates can still cause service interruptions during pod startup and termination, explains the underlying mechanisms of Kubelet and Endpoint Controller, and provides practical steps such as readiness probes and preStop hooks to achieve smoother, near‑zero‑downtime deployments.

360 Quality & Efficiency
360 Quality & Efficiency
360 Quality & Efficiency
Ensuring Zero‑Downtime Rolling Updates in Kubernetes: Causes and Solutions

Kubernetes has become the standard platform for rapid application deployment and scaling, but rolling updates may still lead to service interruptions. The article first describes the problem: despite ensuring at least one pod is ready, users still experience connection refusals during updates.

Problem Causes

1. Issues during pod startup – If a readiness probe is not defined, a pod is considered ready immediately and may receive traffic before the application is fully initialized, causing connection‑refused errors.

2. Issues during pod termination – When a pod receives a termination signal, Kubelet handles container shutdown while the Endpoint Controller updates iptables to remove the pod from the service. Because these actions run in parallel, the iptables update can lag behind, allowing traffic to be routed to a pod that has already stopped, resulting in failed connections.

The article highlights that Kubelet and Endpoint Controller operate concurrently, and the longer iptables update chain can cause the race condition where the application has exited but the service routing still points to it.

Solution Steps

1. Prevent connection refusals on pod startup – Configure a proper readiness probe (e.g., an HTTP GET to a simple endpoint) so the pod is marked ready only after the application can handle requests.

2. Prevent connection interruptions on pod termination – Use a preStop hook to delay termination for a few seconds, allowing the Endpoint Controller to finish updating iptables before the container stops. The hook can be defined as:

lifecycle:
  preStop:
    exec:
      command:
        - sh
        - -c
        - "sleep 5"

Even a short 5‑10 second delay significantly improves deployment stability.

Graceful shutdown steps

1. Wait a few seconds, then stop accepting new traffic.

2. Wait for all in‑flight requests to complete.

3. Finally terminate the process.

Note that Kubernetes defaults to a 30‑second terminationGracePeriodSeconds, but this may need adjustment under high load.

Summary

Kubernetes provides strong support for automated rolling updates, but achieving true zero‑downtime in production requires understanding the pod lifecycle, the behavior of Kubelet and Endpoint Controller, and applying readiness probes and graceful termination hooks to ensure stable service deployments.

Cloud NativeKuberneteszero downtimerolling updatePreStop HookReadiness Probe
360 Quality & Efficiency
Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.