Cloud Native 36 min read

Master Kubernetes Cluster: Install, Upgrade, Backup, and Restore Step‑by‑Step

This comprehensive guide walks you through installing a Kubernetes cluster with kubeadm, configuring containerd, initializing master and worker nodes, deploying Calico networking and the Dashboard, performing upgrades, renewing certificates, adding or removing nodes, and backing up both etcd data and cluster manifests using scripts and Velero.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Master Kubernetes Cluster: Install, Upgrade, Backup, and Restore Step‑by‑Step

Install Kubernetes Cluster

Kubernetes is a container‑orchestration platform that runs as a cluster. As a cluster maintainer you often need to manage the whole lifecycle.

Prerequisites

Cluster nodes: 2

Master IP: 192.168.205.128

Node IP: 192.168.205.130

Kubernetes version: v1.24.2

Container runtime: containerd

OS: CentOS 7.9 (kernel 3.10.0‑1160)

Environment Preparation

(1) Add host entries on each node

<code>cat >> /etc/hosts <<EOF
192.168.205.128 kk-master
192.168.205.130 kk-node01
EOF
</code>

(2) Disable firewall and SELinux

<code>systemctl stop firewalld
systemctl disable firewalld
setenforce 0
cat /etc/selinux/config
SELINUX=disabled
</code>

(3) Optimize kernel parameters

<code>cat > /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
vm.swappiness=0
EOF
modprobe br_netfilter
sysctl -p /etc/sysctl.d/k8s.conf
</code>

(4) Disable swap

<code>swapoff -a
# comment swap line in /etc/fstab
sed -i '/ swap / s/^/#/' /etc/fstab
</code>

(5) Install IPVS modules

<code>cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4
yum install -y ipset ipvsadm
</code>

(6) Sync server time

<code>yum install -y chrony
systemctl enable chronyd
systemctl start chronyd
chronyc sources
</code>

(7) Install containerd

<code>yum install -y yum-utils \ 
  device-mapper-persistent-data \ 
  lvm2
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum list | grep containerd
yum install -y containerd
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
sed -i "s#k8s.gcr.io#registry.cn-hangzhou.aliyuncs.com/google_containers#g" /etc/containerd/config.toml
sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml
sed -i "s#https://registry-1.docker.io#https://registry.cn-hangzhou.aliyuncs.com#g" /etc/containerd/config.toml
systemctl daemon-reload
systemctl enable containerd
systemctl restart containerd
</code>

(8) Install Kubernetes components

<code>cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

yum install -y kubelet-1.24.2 kubeadm-1.24.2 kubectl-1.24.2
crictl config runtime-endpoint /run/containerd/containerd.sock
systemctl daemon-reload
systemctl enable kubelet && systemctl start kubelet
</code>

Initialize the Cluster

Export the default kubeadm config and edit it (set imageRepository, kube-proxy mode to ipvs, and cgroupDriver to systemd).

<code>kubeadm config print init-defaults > kubeadm.yaml
# edit kubeadm.yaml as needed (example snippet shown below)
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.205.128
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  name: master
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kubernetesVersion: 1.24.2
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
</code>

Run the initialization:

<code>kubeadm init --config=kubeadm.yaml
# After success, run as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# Or as root:
export KUBECONFIG=/etc/kubernetes/admin.conf
</code>

Join Worker Node

On each worker execute the join command printed by the init output, e.g.:

<code>kubeadm join 192.168.205.128:6443 --token abcdef.0123456789abcdef \
    --discovery-token-ca-cert-hash sha256:51b5e566d3f95aaf3170916d67958bc16cb1b44934885a857b07ee58f041334a
</code>

Verify nodes:

<code>kubectl get nodes
</code>

Install Network Plugin (Calico)

<code>wget https://raw.githubusercontent.com/projectcalico/calico/master/manifests/calico.yaml
kubectl apply -f calico.yaml
kubectl get po -n kube-system | grep calico
</code>

Install Kubernetes Dashboard

<code>kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.5.0/aio/deploy/recommended.yaml
kubectl get po -n kubernetes-dashboard
kubectl -n kubernetes-dashboard edit svc kubernetes-dashboard   # change type to NodePort
# Access via https://<master_ip>:<nodePort>
</code>

Generate an admin token (Kubernetes 1.24 no longer auto‑creates ServiceAccount tokens):

<code>cat <<EOF > admin-token.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: dashboard-admin
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: dashboard-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: admin
subjects:
- kind: ServiceAccount
  name: dashboard-admin
  namespace: kube-system
EOF
kubectl apply -f admin-token.yaml
kubectl -n kube-system get secret $(kubectl -n kube-system get sa/dashboard-admin -o jsonpath="{.secrets[0].name}") -o jsonpath="{.data.token}" | base64 -d
</code>

Update Cluster

Upgrade Kubernetes Version

Check current version and target version (e.g., upgrade from v1.24.0 to v1.24.2).

<code># Backup first (see backup section)
# Upgrade kubeadm on control plane
yum install -y kubeadm-1.24.2-0 --disableexcludes=kubernetes
kubeadm upgrade plan   # shows target version
kubeadm upgrade apply v1.24.2 --config kubeadm.yaml
# Upgrade kubelet and kubectl
yum install -y kubelet-1.24.2-0 kubectl-1.24.2-0 --disableexcludes=kubernetes
systemctl daemon-reload
systemctl restart kubelet
# Drain, upgrade, and uncordon the master node
kubectl cordon kk-master
kubectl drain kk-master --ignore-daemonsets=true
# after upgrade
kubectl uncordon kk-master
</code>

Upgrade Worker Nodes

<code># On each node
yum install -y kubeadm-1.24.2-0 --disableexcludes=kubernetes
kubectl cordon kk-node01
kubectl drain kk-node01 --ignore-daemonsets=true
kubeadm upgrade node
yum install -y kubelet-1.24.2-0 --disableexcludes=kubernetes
systemctl daemon-reload
systemctl restart kubelet
kubectl uncordon kk-node01
</code>

Renew Certificates

Check expiration:

<code>kubeadm certs check-expiration
</code>

Backup certificates and etcd, then renew:

<code>mkdir -p /etc/kubernetes.bak.$(date +%Y%m%d)
cp -r /etc/kubernetes/* /etc/kubernetes.bak.$(date +%Y%m%d)
# Renew all certs using the same kubeadm config
kubeadm alpha certs renew all --config=kubeadm.yaml
# Regenerate kubeconfigs
kubeadm init phase kubeconfig all --config=kubeadm.yaml
mv $HOME/.kube/config $HOME/.kube/config.old
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
# Restart control‑plane static pods
cd /etc/kubernetes/manifests
mv *.yaml ../
mv ../*.yaml .
</code>

Add or Remove Nodes

Add node – repeat the environment‑preparation steps on the new host and run the join command obtained from the existing cluster (use

kubeadm token create

if needed).

<code># On new node
cat >> /etc/hosts <<EOF
192.168.205.128 kk-master
192.168.205.130 kk-node01
192.168.205.133 kk-node02
EOF
kubeadm token create
kubeadm join 192.168.205.128:6443 --token <token> \
    --discovery-token-ca-cert-hash sha256:<hash> --node-name kk-node02
</code>

Remove node :

<code>kubectl cordon kk-node02
kubectl drain kk-node02 --ignore-daemonsets=true --delete-emptydir-data=true
kubectl delete node kk-node02
</code>

Backup Cluster

Backup etcd Database

Install etcdctl and take a snapshot:

<code>export ETCDCTL_API=3
etcdctl --endpoints=localhost:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  snapshot save /var/backups/etcd/snapshot.db
</code>

Automate with a shell script (attributes stripped) and add to cron (e.g., every 30 minutes).

<code>#!/bin/bash
ETCDCTL_PATH=/usr/local/bin/etcdctl
ENDPOINTS='192.168.205.128:2379'
BACKUP_DIR="/var/backups/kube_etcd/etcd-$(date +%Y-%m-%d_%H:%M:%S)"
ETCDCTL_CERT="/etc/kubernetes/pki/etcd/server.crt"
ETCDCTL_KEY="/etc/kubernetes/pki/etcd/server.key"
ETCDCTL_CA="/etc/kubernetes/pki/etcd/ca.crt"
mkdir -p "$BACKUP_DIR"
export ETCDCTL_API=3
$ETCDCTL_PATH --endpoints="$ENDPOINTS" \
  --cacert="$ETCDCTL_CA" \
  --cert="$ETCDCTL_CERT" \
  --key="$ETCDCTL_KEY" snapshot save "$BACKUP_DIR/snapshot.db"
# Keep only the latest 5 backups
cd $(dirname "$BACKUP_DIR")
ls -1t | awk 'NR>5{print "rm -rf " $0}' | sh
</code>

Restore etcd

<code># Stop static pods
cd /etc/kubernetes/manifests
mv *.yaml ../
# Move old data directory aside
mv /var/lib/etcd /var/lib/etcd.bak
# Restore snapshot
ETCDCTL_API=3 etcdctl snapshot restore /var/backups/etcd/snapshot.db \
  --name kk-master \
  --initial-cluster "kk-master=https://192.168.205.128:2380" \
  --initial-cluster-token etcd-cluster \
  --initial-advertise-peer-urls https://192.168.205.128:2380 \
  --data-dir=/var/lib/etcd
# Restart static pods
mv ../*.yaml .
</code>

Backup Cluster Manifests with Velero

Install MinIO (object storage) via Helm:

<code>helm repo add minio https://helm.min.io/
helm install minio \
  --namespace velero --create-namespace \
  --set accessKey=minio,secretKey=minio123 \
  --set mode=standalone \
  --set service.type=NodePort \
  --set persistence.enabled=false minio/minio
</code>

Create a bucket named

velero

in the MinIO UI (http:// :32000).

Create a credentials file (credentials-velero):

<code>[default]
aws_access_key_id=minio
aws_secret_access_key=minio123
</code>

Install Velero pointing to MinIO:

<code>velero install \
  --provider aws \
  --bucket velero \
  --image velero/velero:v1.6.3 \
  --plugins velero/velero-plugin-for-aws:v1.2.1 \
  --namespace velero \
  --secret-file ./credentials-velero \
  --use-volume-snapshots=false \
  --use-restic \
  --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero.svc:9000
</code>

Verify components are running:

<code>kubectl get po -n velero
</code>

Backup the

default

namespace:

<code>velero backup create default-backup-$(date +%Y%m%d) --include-namespaces default --default-volumes-to-restic
</code>

Delete a resource (e.g., the nginx deployment) and restore it:

<code>kubectl delete deployment nginx
velero restore create --from-backup default-backup-$(date +%Y%m%d)
</code>

Summary

Kubernetes forms the foundation for cloud‑native applications; reliable backup and upgrade procedures are essential to maintain platform stability. By following the steps above—installing the cluster, configuring networking, managing upgrades, renewing certificates, adding/removing nodes, and backing up both etcd and manifests—you can ensure a resilient and maintainable Kubernetes environment.

Cluster management diagram
Cluster management diagram
Kubernetes Dashboard login page
Kubernetes Dashboard login page
MinIO web UI
MinIO web UI
Velero bucket in MinIO
Velero bucket in MinIO
Velero backup files
Velero backup files
Kubernetescluster upgradeCalicokubeadmVeleroetcd backupKubernetes Dashboard
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.