Master Kubernetes Cluster: Install, Upgrade, Backup, and Restore Step‑by‑Step
This comprehensive guide walks you through installing a Kubernetes cluster with kubeadm, configuring containerd, initializing master and worker nodes, deploying Calico networking and the Dashboard, performing upgrades, renewing certificates, adding or removing nodes, and backing up both etcd data and cluster manifests using scripts and Velero.
Install Kubernetes Cluster
Kubernetes is a container‑orchestration platform that runs as a cluster. As a cluster maintainer you often need to manage the whole lifecycle.
Prerequisites
Cluster nodes: 2
Master IP: 192.168.205.128
Node IP: 192.168.205.130
Kubernetes version: v1.24.2
Container runtime: containerd
OS: CentOS 7.9 (kernel 3.10.0‑1160)
Environment Preparation
(1) Add host entries on each node
<code>cat >> /etc/hosts <<EOF
192.168.205.128 kk-master
192.168.205.130 kk-node01
EOF
</code>(2) Disable firewall and SELinux
<code>systemctl stop firewalld
systemctl disable firewalld
setenforce 0
cat /etc/selinux/config
SELINUX=disabled
</code>(3) Optimize kernel parameters
<code>cat > /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
vm.swappiness=0
EOF
modprobe br_netfilter
sysctl -p /etc/sysctl.d/k8s.conf
</code>(4) Disable swap
<code>swapoff -a
# comment swap line in /etc/fstab
sed -i '/ swap / s/^/#/' /etc/fstab
</code>(5) Install IPVS modules
<code>cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4
yum install -y ipset ipvsadm
</code>(6) Sync server time
<code>yum install -y chrony
systemctl enable chronyd
systemctl start chronyd
chronyc sources
</code>(7) Install containerd
<code>yum install -y yum-utils \
device-mapper-persistent-data \
lvm2
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum list | grep containerd
yum install -y containerd
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
sed -i "s#k8s.gcr.io#registry.cn-hangzhou.aliyuncs.com/google_containers#g" /etc/containerd/config.toml
sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml
sed -i "s#https://registry-1.docker.io#https://registry.cn-hangzhou.aliyuncs.com#g" /etc/containerd/config.toml
systemctl daemon-reload
systemctl enable containerd
systemctl restart containerd
</code>(8) Install Kubernetes components
<code>cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
yum install -y kubelet-1.24.2 kubeadm-1.24.2 kubectl-1.24.2
crictl config runtime-endpoint /run/containerd/containerd.sock
systemctl daemon-reload
systemctl enable kubelet && systemctl start kubelet
</code>Initialize the Cluster
Export the default kubeadm config and edit it (set imageRepository, kube-proxy mode to ipvs, and cgroupDriver to systemd).
<code>kubeadm config print init-defaults > kubeadm.yaml
# edit kubeadm.yaml as needed (example snippet shown below)
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.205.128
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
name: master
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kubernetesVersion: 1.24.2
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
</code>Run the initialization:
<code>kubeadm init --config=kubeadm.yaml
# After success, run as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# Or as root:
export KUBECONFIG=/etc/kubernetes/admin.conf
</code>Join Worker Node
On each worker execute the join command printed by the init output, e.g.:
<code>kubeadm join 192.168.205.128:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:51b5e566d3f95aaf3170916d67958bc16cb1b44934885a857b07ee58f041334a
</code>Verify nodes:
<code>kubectl get nodes
</code>Install Network Plugin (Calico)
<code>wget https://raw.githubusercontent.com/projectcalico/calico/master/manifests/calico.yaml
kubectl apply -f calico.yaml
kubectl get po -n kube-system | grep calico
</code>Install Kubernetes Dashboard
<code>kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.5.0/aio/deploy/recommended.yaml
kubectl get po -n kubernetes-dashboard
kubectl -n kubernetes-dashboard edit svc kubernetes-dashboard # change type to NodePort
# Access via https://<master_ip>:<nodePort>
</code>Generate an admin token (Kubernetes 1.24 no longer auto‑creates ServiceAccount tokens):
<code>cat <<EOF > admin-token.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: dashboard-admin
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: dashboard-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: admin
subjects:
- kind: ServiceAccount
name: dashboard-admin
namespace: kube-system
EOF
kubectl apply -f admin-token.yaml
kubectl -n kube-system get secret $(kubectl -n kube-system get sa/dashboard-admin -o jsonpath="{.secrets[0].name}") -o jsonpath="{.data.token}" | base64 -d
</code>Update Cluster
Upgrade Kubernetes Version
Check current version and target version (e.g., upgrade from v1.24.0 to v1.24.2).
<code># Backup first (see backup section)
# Upgrade kubeadm on control plane
yum install -y kubeadm-1.24.2-0 --disableexcludes=kubernetes
kubeadm upgrade plan # shows target version
kubeadm upgrade apply v1.24.2 --config kubeadm.yaml
# Upgrade kubelet and kubectl
yum install -y kubelet-1.24.2-0 kubectl-1.24.2-0 --disableexcludes=kubernetes
systemctl daemon-reload
systemctl restart kubelet
# Drain, upgrade, and uncordon the master node
kubectl cordon kk-master
kubectl drain kk-master --ignore-daemonsets=true
# after upgrade
kubectl uncordon kk-master
</code>Upgrade Worker Nodes
<code># On each node
yum install -y kubeadm-1.24.2-0 --disableexcludes=kubernetes
kubectl cordon kk-node01
kubectl drain kk-node01 --ignore-daemonsets=true
kubeadm upgrade node
yum install -y kubelet-1.24.2-0 --disableexcludes=kubernetes
systemctl daemon-reload
systemctl restart kubelet
kubectl uncordon kk-node01
</code>Renew Certificates
Check expiration:
<code>kubeadm certs check-expiration
</code>Backup certificates and etcd, then renew:
<code>mkdir -p /etc/kubernetes.bak.$(date +%Y%m%d)
cp -r /etc/kubernetes/* /etc/kubernetes.bak.$(date +%Y%m%d)
# Renew all certs using the same kubeadm config
kubeadm alpha certs renew all --config=kubeadm.yaml
# Regenerate kubeconfigs
kubeadm init phase kubeconfig all --config=kubeadm.yaml
mv $HOME/.kube/config $HOME/.kube/config.old
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
# Restart control‑plane static pods
cd /etc/kubernetes/manifests
mv *.yaml ../
mv ../*.yaml .
</code>Add or Remove Nodes
Add node – repeat the environment‑preparation steps on the new host and run the join command obtained from the existing cluster (use
kubeadm token createif needed).
<code># On new node
cat >> /etc/hosts <<EOF
192.168.205.128 kk-master
192.168.205.130 kk-node01
192.168.205.133 kk-node02
EOF
kubeadm token create
kubeadm join 192.168.205.128:6443 --token <token> \
--discovery-token-ca-cert-hash sha256:<hash> --node-name kk-node02
</code>Remove node :
<code>kubectl cordon kk-node02
kubectl drain kk-node02 --ignore-daemonsets=true --delete-emptydir-data=true
kubectl delete node kk-node02
</code>Backup Cluster
Backup etcd Database
Install etcdctl and take a snapshot:
<code>export ETCDCTL_API=3
etcdctl --endpoints=localhost:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /var/backups/etcd/snapshot.db
</code>Automate with a shell script (attributes stripped) and add to cron (e.g., every 30 minutes).
<code>#!/bin/bash
ETCDCTL_PATH=/usr/local/bin/etcdctl
ENDPOINTS='192.168.205.128:2379'
BACKUP_DIR="/var/backups/kube_etcd/etcd-$(date +%Y-%m-%d_%H:%M:%S)"
ETCDCTL_CERT="/etc/kubernetes/pki/etcd/server.crt"
ETCDCTL_KEY="/etc/kubernetes/pki/etcd/server.key"
ETCDCTL_CA="/etc/kubernetes/pki/etcd/ca.crt"
mkdir -p "$BACKUP_DIR"
export ETCDCTL_API=3
$ETCDCTL_PATH --endpoints="$ENDPOINTS" \
--cacert="$ETCDCTL_CA" \
--cert="$ETCDCTL_CERT" \
--key="$ETCDCTL_KEY" snapshot save "$BACKUP_DIR/snapshot.db"
# Keep only the latest 5 backups
cd $(dirname "$BACKUP_DIR")
ls -1t | awk 'NR>5{print "rm -rf " $0}' | sh
</code>Restore etcd
<code># Stop static pods
cd /etc/kubernetes/manifests
mv *.yaml ../
# Move old data directory aside
mv /var/lib/etcd /var/lib/etcd.bak
# Restore snapshot
ETCDCTL_API=3 etcdctl snapshot restore /var/backups/etcd/snapshot.db \
--name kk-master \
--initial-cluster "kk-master=https://192.168.205.128:2380" \
--initial-cluster-token etcd-cluster \
--initial-advertise-peer-urls https://192.168.205.128:2380 \
--data-dir=/var/lib/etcd
# Restart static pods
mv ../*.yaml .
</code>Backup Cluster Manifests with Velero
Install MinIO (object storage) via Helm:
<code>helm repo add minio https://helm.min.io/
helm install minio \
--namespace velero --create-namespace \
--set accessKey=minio,secretKey=minio123 \
--set mode=standalone \
--set service.type=NodePort \
--set persistence.enabled=false minio/minio
</code>Create a bucket named
veleroin the MinIO UI (http:// :32000).
Create a credentials file (credentials-velero):
<code>[default]
aws_access_key_id=minio
aws_secret_access_key=minio123
</code>Install Velero pointing to MinIO:
<code>velero install \
--provider aws \
--bucket velero \
--image velero/velero:v1.6.3 \
--plugins velero/velero-plugin-for-aws:v1.2.1 \
--namespace velero \
--secret-file ./credentials-velero \
--use-volume-snapshots=false \
--use-restic \
--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero.svc:9000
</code>Verify components are running:
<code>kubectl get po -n velero
</code>Backup the
defaultnamespace:
<code>velero backup create default-backup-$(date +%Y%m%d) --include-namespaces default --default-volumes-to-restic
</code>Delete a resource (e.g., the nginx deployment) and restore it:
<code>kubectl delete deployment nginx
velero restore create --from-backup default-backup-$(date +%Y%m%d)
</code>Summary
Kubernetes forms the foundation for cloud‑native applications; reliable backup and upgrade procedures are essential to maintain platform stability. By following the steps above—installing the cluster, configuring networking, managing upgrades, renewing certificates, adding/removing nodes, and backing up both etcd and manifests—you can ensure a resilient and maintainable Kubernetes environment.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.