How to Safely Backup and Restore Etcd in Kubernetes: A Step‑by‑Step Guide
This article explains why regular Etcd snapshots are essential for Kubernetes disaster recovery and provides detailed, command‑line procedures for restoring Etcd data on both single‑node and high‑availability clusters, including necessary configuration adjustments and verification steps.
1. Overview
In a Kubernetes cluster, all operational resource data is stored in the Etcd database. To ensure rapid recovery after node failures, cluster migrations, or other anomalies, regular disaster‑recovery backups of Etcd data are required.
Kubernetes makes Etcd backup easy: taking a snapshot on a single node captures the entire cluster state. With a snapshot, even if all control‑plane nodes are lost, the cluster can be quickly restored.
Note: Even in a highly available Etcd cluster, a backup on one node is sufficient, but it is strongly recommended to back up on all Etcd nodes and regularly copy snapshots to a dedicated storage server.
2. Practical Etcd Snapshot Restoration
2.1 Single‑Node Recovery
Description: When a single node’s resource data is lost, the following steps restore the data quickly.
Procedure:
(1) Stop the Etcd service on the node systemctl stop etcd (2) Backup the Etcd data directory mv /var/lib/etcd /var/lib/etcd.bak (3) Restore Etcd data using the snapshot file
etcdctl --cacert=/opt/etcd/ssl/ca.pem \
--cert=/opt/etcd/ssl/server.pem \
--key=/opt/etcd/ssl/server-key.pem \
--endpoints 10.20.30.31:2379 \
snapshot restore /var/backups/kube_etcd/etcd-2024-0206-snapshot.db \
--name=etcd01 \
--initial-cluster=etcd01=https://10.20.30.31:2380 \
--initial-advertise-peer-urls=https://10.20.30.31:2380 \
--data-dir=/var/lib/etcdNote 1: The etcdctl client uses the v3 API by default. Note 2: Replace IP addresses, certificates, keys, and snapshot file paths with those of your actual cluster.
(4) Start the Etcd service systemctl start etcd (5) Verify Etcd node status
etcdctl --cacert=/opt/etcd/ssl/ca.pem \
--cert=/opt/etcd/ssl/server.pem \
--key=/opt/etcd/ssl/server-key.pem \
--endpoints "https://10.20.30.31:2379" endpoint status --write-out=table2.2 High‑Availability Cluster Recovery
Restoring a HA Etcd cluster requires restoring each node individually. The example below uses a three‑node cluster.
(1) Gather node information for the HA cluster.
(2) Install a fresh Etcd service on each node (example service file shown).
# /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
Wants=network-online.target
[Service]
Type=notify
EnvironmentFile=/opt/etcd/cfg/etcd.conf
ExecStart=/opt/etcd/bin/etcd
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.targetExample member configuration (node 103 shown):
# [Member]
ETCD_NAME="etcd01"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_PEER_URLS="https://10.20.31.103:2380"
ETCD_LISTEN_CLIENT_URLS="https://10.20.31.103:2379,http://127.0.0.1:2379"
# [Clustering]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.20.31.103:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://10.20.31.103:2379"
ETCD_INITIAL_CLUSTER="etcd01=https://10.20.31.103:2380,etcd02=https://10.20.31.104:2380,etcd03=https://10.20.31.105:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_ENABLE_V2="true"
# [Security]
ETCD_CERT_FILE="/opt/etcd/ssl/server.pem"
ETCD_KEY_FILE="/opt/etcd/ssl/server-key.pem"
ETCD_TRUSTED_CA_FILE="/opt/etcd/ssl/ca.pem"
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CERT_FILE="/opt/etcd/ssl/server.pem"
ETCD_PEER_KEY_FILE="/opt/etcd/ssl/server-key.pem"
ETCD_PEER_TRUSTED_CA_FILE="/opt/etcd/ssl/ca.pem"
ETCD_PEER_CLIENT_CERT_AUTH="true"Verify the new cluster with etcdctl:
/opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem \
--cert=/opt/etcd/ssl/server.pem \
--key=/opt/etcd/ssl/server-key.pem \
--endpoints "https://10.20.31.103:2379,https://10.20.31.104:2379,https://10.20.31.105:2379" \
endpoint status --write-out=table(2) Stop Etcd service on all nodes: systemctl stop etcd (3) Backup the Etcd data directory on each node: mv /var/lib/etcd /var/lib/etcd.bak (4) Restore each node using the snapshot file (example for node 103):
/opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem \
--cert=/opt/etcd/ssl/server.pem \
--key=/opt/etcd/ssl/server-key.pem \
snapshot restore snapshot.db \
--name etcd01 \
--initial-cluster=etcd01=https://10.20.31.103:2380,etcd02=https://10.20.31.104:2380,etcd03=https://10.20.31.105:2380 \
--initial-cluster-token=etcd-cluster \
--initial-advertise-peer-urls=https://10.20.31.103:2380 \
--data-dir=/var/lib/etcdRepeat the above command for nodes 104 and 105, adjusting --name and --initial-advertise-peer-urls accordingly.
(5) Start Etcd service on all nodes: systemctl start etcd (6) Verify cluster status:
/opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem \
--cert=/opt/etcd/ssl/server.pem \
--key=/opt/etcd/ssl/server-key.pem \
--endpoints "https://10.20.31.103:2379,https://10.20.31.104:2379,https://10.20.31.105:2379" \
endpoint status --write-out=tableAfter these steps, the HA Etcd cluster is restored and ready to serve.
3. Summary
With a single snapshot file, you can restore an Etcd cluster using etcdctl snapshot restore, creating a new data directory for all nodes. The restore overwrites certain metadata (member ID, cluster ID) to prevent accidental joining of other clusters.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
