Mastering etcd in Kubernetes: Backup, Scaling, and Secure Deployment
This guide explains what etcd is, how to prepare a Kubernetes environment, start single‑node or multi‑node clusters, secure communication with TLS, replace failed members, perform built‑in and volume backups, scale the cluster, recover from snapshots, upgrade to etcd 3, and avoid known client‑balancer bugs.
What is etcd?
etcd is a consistent, highly‑available key‑value store that backs all Kubernetes cluster data. Backing up the etcd datastore is essential for disaster recovery.
Prepare a Kubernetes environment
You need a running Kubernetes cluster with the kubectl CLI. If you do not have one, follow the official installation guides for a 1.18 or 1.17 HA cluster.
Prerequisites
Run an odd number of etcd members.
etcd uses a leader‑based model; the leader must send heartbeats to followers.
Ensure sufficient CPU, memory, network and disk I/O; resource starvation can cause heartbeat timeouts and cluster instability.
Run etcd on dedicated machines or isolated environments.
Production clusters should use etcd version 3.2.10 or newer.
Resource requirements
Testing can use minimal resources, but production deployments require robust hardware.
Start an etcd cluster
Single‑node (testing only)
./etcd --listen-client-urls=http://$PRIVATE_IP:2379 \
--advertise-client-urls=http://$PRIVATE_IP:2379Then start the Kubernetes API server with --etcd-servers=$PRIVATE_IP:2379.
Multi‑node (high availability)
Deploy a five‑member cluster for production. Example client URLs:
http://$IP1:2379, http://$IP2:2379, http://$IP3:2379, http://$IP4:2379, http://$IP5:2379Start the API server with the same list in --etcd-servers.
Load‑balanced multi‑node
Place a load balancer in front of the members and configure the API server with --etcd-servers=$LB:2379.
Secure the etcd cluster
Use TLS with peer and client certificates ( peer.key, peer.cert, client.key, client.cert) and enable --peer-key-file, --peer-cert-file, --key-file, --cert-file. Restrict access to the API server by enabling --client-cert-auth and providing a trusted CA.
Replace a failed member
Identify the failed member ID with etcdctl member list.
Remove it: etcdctl member remove <ID>.
Add a new member:
./etcdctl member add member4 --peer-urls=http://10.0.0.4:2380.
Start the new member with appropriate environment variables ( ETCD_NAME, ETCD_INITIAL_CLUSTER, ETCD_INITIAL_CLUSTER_STATE=existing).
Update the API server’s --etcd-servers flag or the load balancer configuration.
Backup etcd
Built‑in snapshot
ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshotdb
ETCDCTL_API=3 etcdctl --write-out=table snapshot status snapshotdbVolume snapshot
If etcd resides on a cloud block volume (e.g., AWS EBS), create a volume snapshot to capture the data.
Scale the cluster
Increasing the member count improves availability but not raw performance. Production clusters typically run a static five‑member configuration; scaling should be performed via the official reconfiguration procedure.
Recover from a snapshot
Use etcdctl snapshot restore with the saved file, then restart the API server pointing to the new endpoint list ( --etcd-servers=$NEW_ETCD_CLUSTER).
Upgrade and rollback
Kubernetes v1.13 removed etcd 2 support. When upgrading from v1.12 to v1.13, migrate data to etcd 3 and change the API server flag to --storage-backend=etcd3. Follow the vendor‑specific upgrade guide for your cluster.
Known issue
etcd v3.3.13 and earlier have a bug that prevents client load‑balancing over secure endpoints, which can cause temporary disconnections for the API server.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Full-Stack DevOps & Kubernetes
Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
