Achieve Zero‑Downtime Updates in Your Kubernetes Cluster
This article explains how to perform zero‑downtime updates on a Kubernetes cluster by using native tools such as pod lifecycle hooks, graceful termination, and PodDisruptionBudgets, guiding you through a step‑by‑step series that covers draining nodes, graceful pod shutdown, and avoiding service interruptions.
During a Kubernetes cluster’s lifecycle you will need to maintain the underlying nodes, which may involve package updates, kernel upgrades, or replacing VM images. In Kubernetes this is considered a “voluntary disruption”.
This post is part of a four‑part series:
Zero‑downtime server updates (this article)
Gracefully shutting down Pods
Delaying shutdown to wait for Pod deletion propagation
Using PodDisruptionBudgets to avoid interruptions
The series will cover all Kubernetes tools needed to achieve zero‑downtime updates for the underlying worker nodes.
Problem Statement
We start with a native approach, identify its challenges and risks, and gradually build a solution using lifecycle hooks, readiness probes, and PodDisruptionBudgets to achieve zero‑downtime deployments.
Consider a two‑node Kubernetes cluster running an application with two Pods behind a Service:
Our starting point is two Nginx Pods and a Service running on a two‑node Kubernetes cluster.
We need to upgrade the kernel version of the two worker nodes. The naïve method is to launch new nodes with the updated configuration and then shut down the old nodes. This approach has several problems:
When the old nodes are shut down, the Pods running on them are terminated immediately, potentially before they can clean up resources.
If all nodes are shut down simultaneously, Pods are restarted on the new nodes, causing a brief service interruption.
We want a method to gracefully migrate Pods off the old nodes so that no workload runs on a node being changed. Using kubectl drain we can evict Pods and prevent new Pods from being scheduled on the node.
Rescheduling Pods
The drain operation marks the node as unschedulable with a NoSchedule taint, then evicts Pods by sending a TERM signal to the containers.
Although kubectl drain handles eviction gracefully, two factors can still cause service disruption:
Your application must handle the TERM signal properly; otherwise containers may be killed abruptly during critical work.
All Pods providing the service are removed before new Pods start on other nodes, which can lead to downtime if the Pods are not recreated quickly.
Avoiding Downtime
Kubernetes offers the following interruption‑handling features to minimize downtime caused by voluntary disruptions:
Graceful termination
Lifecycle hooks
PodDisruptionBudgets
In the remaining parts of the series we will use these features to mitigate the impact of eviction. The following resources will be used as a base configuration for the examples:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.15
ports:
- containerPort: 80
---
kind: Service
apiVersion: v1
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
targetPort: 80
port: 80This minimal Deployment manages two Nginx Pods and a Service to access them. Throughout the series we will incrementally enhance this configuration to incorporate graceful termination, lifecycle hooks, and PodDisruptionBudgets, ultimately achieving zero‑downtime maintenance.
Gracefully shut down Pods
Delay shutdown to wait for Pod deletion propagation
Use PodDisruptionBudgets to avoid interruptions
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
