Kubernetes Cluster Upgrade Guide: Pre‑Checks, Methods & Step‑by‑Step
This article explains why Kubernetes clusters need regular upgrades, outlines the challenges, details essential pre‑upgrade health checks for core components, nodes and cloud resources, compares in‑place and replacement upgrade strategies with their pros and cons, and presents a three‑stage upgrade process covering master, worker and core system components.
Why Upgrade a Kubernetes Cluster?
Kubernetes releases a new version roughly every quarter, adding features, security hardening and bug fixes. Keeping clusters up‑to‑date lets users benefit from the community’s rapid development and reduces operational risk.
For cluster users : newer versions provide additional features, comprehensive security patches and many bug fixes.
For cluster operators : aligning all clusters to the same version reduces version fragmentation and lowers management and maintenance costs.
Upgrade Difficulties
Long‑running clusters accumulate complex runtime state.
Each cluster may have custom configurations ("thousands of clusters, thousands of faces"), making a one‑size‑fits‑all upgrade logic hard.
Cloud‑based clusters depend on many underlying cloud resources, introducing additional uncertainty.
Pre‑Upgrade Checks
Perform three categories of checks before upgrading to mitigate risk.
1. Core Component Health Checks
Network components must be compatible with the target Kubernetes version.
All apiservice objects must be available.
All nodes must report Ready status.
2. Node Configuration Checks
Operating‑system services (yum, systemd, ntp) and kernel parameters should be correctly configured.
Kubelet process must be healthy and configured for the target version.
Docker (or container runtime) daemon must be healthy and configured.
3. Cloud Resource Checks
SLB used by the apiserver: health status and port configuration.
VPC and VSwitch: instance health.
ECS instances: health and network settings.
Two Common Upgrade Strategies
1. In‑Place Upgrade
This method updates components (e.g., kubelet) directly on each worker node while the node remains running. Example: upgrading from 1.14 to 1.16 involves updating kubelet on ECS A, then on ECS B, and so on.
Pros : Pods on the node are not recreated, ensuring business continuity; no underlying ECS replacement, which is friendly to subscription‑based customers.
Cons : The process is not atomic; a failure in any step can leave the node partially upgraded. Sufficient resources must be reserved for the upgrade.
2. Replacement (Rolling) Upgrade
Also called “blue‑green” or “node rotation”, this method removes old nodes and adds new ones with the target version. Example: drain and delete each 1.14 node (ECS A, B) and add new 1.16 nodes (ECS C, D) sequentially.
Pros : The upgrade is more atomic; fewer intermediate states reduce the chance of unexpected issues.
Cons : All pods are evicted and recreated, which can impact workloads with low pod‑recreation tolerance, stateful sets, or single‑replica services; local data on nodes is lost; IP changes may affect subscription customers.
Three‑Stage Upgrade Process
1. Rolling Upgrade of Master
Masters can be deployed as static pods, local processes, or as pods in another cluster (Kubernetes‑on‑Kubernetes). Upgrade the three master components:
kube‑apiserver (ensure at least two instances for zero‑downtime)
kube‑controller‑manager
kube‑scheduler
2. Batch Upgrade of Workers
After the master is upgraded, workers are upgraded in batches to avoid simultaneous kubelet restarts. Upgrade kubelet and its dependencies (e.g., CNI) on each node, respecting version compatibility (e.g., a 1.14 kubelet can talk to a 1.16 apiserver).
3. Core System Component Upgrade
Synchronously upgrade core add‑ons to match the new Kubernetes version:
CoreDNS – follow the community version‑compatibility matrix: https://github.com/coredns/deployment/blob/master/kubernetes/CoreDNS-k8s_version.md
Kube‑proxy – upgrade to the same version as the Kubernetes control plane.
Following these checks and steps helps ensure a smooth, reliable Kubernetes cluster upgrade with minimal disruption.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
