Cloud Native 7 min read

Achieve Zero‑Downtime Updates in Your Kubernetes Cluster

This article explains how to perform zero‑downtime updates on a Kubernetes cluster by using native tools such as pod lifecycle hooks, graceful termination, and PodDisruptionBudgets, guiding you through a step‑by‑step series that covers draining nodes, graceful pod shutdown, and avoiding service interruptions.

Programmer DD

Mar 12, 2020

Achieve Zero‑Downtime Updates in Your Kubernetes Cluster

During a Kubernetes cluster’s lifecycle you will need to maintain the underlying nodes, which may involve package updates, kernel upgrades, or replacing VM images. In Kubernetes this is considered a “voluntary disruption”.

This post is part of a four‑part series:

Zero‑downtime server updates (this article)

Gracefully shutting down Pods

Delaying shutdown to wait for Pod deletion propagation

Using PodDisruptionBudgets to avoid interruptions

The series will cover all Kubernetes tools needed to achieve zero‑downtime updates for the underlying worker nodes.

Problem Statement

We start with a native approach, identify its challenges and risks, and gradually build a solution using lifecycle hooks, readiness probes, and PodDisruptionBudgets to achieve zero‑downtime deployments.

Consider a two‑node Kubernetes cluster running an application with two Pods behind a Service:

Our starting point is two Nginx Pods and a Service running on a two‑node Kubernetes cluster.

We need to upgrade the kernel version of the two worker nodes. The naïve method is to launch new nodes with the updated configuration and then shut down the old nodes. This approach has several problems:

When the old nodes are shut down, the Pods running on them are terminated immediately, potentially before they can clean up resources.

If all nodes are shut down simultaneously, Pods are restarted on the new nodes, causing a brief service interruption.

We want a method to gracefully migrate Pods off the old nodes so that no workload runs on a node being changed. Using kubectl drain we can evict Pods and prevent new Pods from being scheduled on the node.

Rescheduling Pods

The drain operation marks the node as unschedulable with a NoSchedule taint, then evicts Pods by sending a TERM signal to the containers.

Although kubectl drain handles eviction gracefully, two factors can still cause service disruption:

Your application must handle the TERM signal properly; otherwise containers may be killed abruptly during critical work.

All Pods providing the service are removed before new Pods start on other nodes, which can lead to downtime if the Pods are not recreated quickly.

Avoiding Downtime

Kubernetes offers the following interruption‑handling features to minimize downtime caused by voluntary disruptions:

Graceful termination

Lifecycle hooks

PodDisruptionBudgets

In the remaining parts of the series we will use these features to mitigate the impact of eviction. The following resources will be used as a base configuration for the examples:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.15
        ports:
        - containerPort: 80
---
kind: Service
apiVersion: v1
metadata:
  name: nginx-service
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    targetPort: 80
    port: 80

This minimal Deployment manages two Nginx Pods and a Service to access them. Throughout the series we will incrementally enhance this configuration to incorporate graceful termination, lifecycle hooks, and PodDisruptionBudgets, ultimately achieving zero‑downtime maintenance.

Gracefully shut down Pods

Delay shutdown to wait for Pod deletion propagation

Use PodDisruptionBudgets to avoid interruptions

Kubernetes Zero Downtime pod disruption budget Cluster Maintenance kubectl drain

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.