Cloud Native 14 min read

Mastering etcd in Kubernetes: Backup, Scaling, and Secure Deployment

This guide explains what etcd is, how to prepare a Kubernetes environment, start single‑node or multi‑node clusters, secure communication with TLS, replace failed members, perform built‑in and volume backups, scale the cluster, recover from snapshots, upgrade to etcd 3, and avoid known client‑balancer bugs.

Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Mastering etcd in Kubernetes: Backup, Scaling, and Secure Deployment

What is etcd?

etcd is a consistent, highly‑available key‑value store that backs all Kubernetes cluster data. Backing up the etcd datastore is essential for disaster recovery.

Prepare a Kubernetes environment

You need a running Kubernetes cluster with the kubectl CLI. If you do not have one, follow the official installation guides for a 1.18 or 1.17 HA cluster.

Prerequisites

Run an odd number of etcd members.

etcd uses a leader‑based model; the leader must send heartbeats to followers.

Ensure sufficient CPU, memory, network and disk I/O; resource starvation can cause heartbeat timeouts and cluster instability.

Run etcd on dedicated machines or isolated environments.

Production clusters should use etcd version 3.2.10 or newer.

Resource requirements

Testing can use minimal resources, but production deployments require robust hardware.

Start an etcd cluster

Single‑node (testing only)

./etcd --listen-client-urls=http://$PRIVATE_IP:2379 \
       --advertise-client-urls=http://$PRIVATE_IP:2379

Then start the Kubernetes API server with --etcd-servers=$PRIVATE_IP:2379.

Multi‑node (high availability)

Deploy a five‑member cluster for production. Example client URLs:

http://$IP1:2379, http://$IP2:2379, http://$IP3:2379, http://$IP4:2379, http://$IP5:2379

Start the API server with the same list in --etcd-servers.

Load‑balanced multi‑node

Place a load balancer in front of the members and configure the API server with --etcd-servers=$LB:2379.

Secure the etcd cluster

Use TLS with peer and client certificates ( peer.key, peer.cert, client.key, client.cert) and enable --peer-key-file, --peer-cert-file, --key-file, --cert-file. Restrict access to the API server by enabling --client-cert-auth and providing a trusted CA.

Replace a failed member

Identify the failed member ID with etcdctl member list.

Remove it: etcdctl member remove <ID>.

Add a new member:

./etcdctl member add member4 --peer-urls=http://10.0.0.4:2380

.

Start the new member with appropriate environment variables ( ETCD_NAME, ETCD_INITIAL_CLUSTER, ETCD_INITIAL_CLUSTER_STATE=existing).

Update the API server’s --etcd-servers flag or the load balancer configuration.

Backup etcd

Built‑in snapshot

ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshotdb
ETCDCTL_API=3 etcdctl --write-out=table snapshot status snapshotdb

Volume snapshot

If etcd resides on a cloud block volume (e.g., AWS EBS), create a volume snapshot to capture the data.

Scale the cluster

Increasing the member count improves availability but not raw performance. Production clusters typically run a static five‑member configuration; scaling should be performed via the official reconfiguration procedure.

Recover from a snapshot

Use etcdctl snapshot restore with the saved file, then restart the API server pointing to the new endpoint list ( --etcd-servers=$NEW_ETCD_CLUSTER).

Upgrade and rollback

Kubernetes v1.13 removed etcd 2 support. When upgrading from v1.12 to v1.13, migrate data to etcd 3 and change the API server flag to --storage-backend=etcd3. Follow the vendor‑specific upgrade guide for your cluster.

Known issue

etcd v3.3.13 and earlier have a bug that prevents client load‑balancing over secure endpoints, which can cause temporary disconnections for the API server.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilityKubernetesSecurityBackupetcd
Full-Stack DevOps & Kubernetes
Written by

Full-Stack DevOps & Kubernetes

Focused on sharing DevOps, Kubernetes, Linux, Docker, Istio, microservices, Spring Cloud, Python, Go, databases, Nginx, Tomcat, cloud computing, and related technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.