Mastering etcd: Architecture, Monitoring & Performance Tuning
This article provides a comprehensive overview of etcd—including its origins, role in Kubernetes, version evolution, layered architecture, key terminology, operational commands, monitoring metrics, benchmarking procedures, disk‑performance testing, and tuning recommendations—for building reliable cloud‑native clusters.
Etcd Overview
Etcd is an open‑source, highly available distributed key‑value store launched by the CoreOS team in June 2013. It implements the Raft consensus algorithm and is written in Go.
Why Kubernetes Uses Etcd
Kubernetes adopted etcd early because its Go implementation, high availability, watch mechanism, CAS, and TTL features match the control plane’s needs. The early 0.4 release used etcd v0.2, and later versions rely on etcd v2 and v3 features such as watch, leader election, and automatic key expiration.
Etcd Version Evolution
Key features of etcd v1 and v2 are summarized in a timeline (image). As Kubernetes grew, performance and stability limits of v2 became apparent, leading to the release of etcd 3.0 in June 2016 and its default use in Kubernetes 1.6, enabling clusters of up to 5 000 nodes.
Architecture
Etcd is layered into Client, API network, Raft algorithm, logical, and storage layers.
Client layer : v2 and v3 API client libraries with load balancing and automatic failover.
API network layer : client‑to‑server communication (v2 uses HTTP/1.x, v3 uses gRPC) and inter‑node Raft communication.
Raft algorithm layer : leader election, log replication, and read‑index for strong consistency.
Logical layer : core modules such as KVServer, MVCC, authentication, lease, and compaction.
Storage layer : WAL, snapshot, and BoltDB modules that persist cluster metadata and user data.
Key Terminology
Raft – consensus algorithm.
Node – a Raft state‑machine instance.
Member – an etcd instance that serves client requests.
Cluster – a set of members working together.
Peer – another member in the same cluster.
WAL – write‑ahead log.
Snapshot – point‑in‑time copy of the data.
Leader, Follower, Candidate, Term, Index – Raft concepts.
Operational Practices
Common etcdctl Commands
<code>ETCD_CA_CERT="/etc/kubernetes/pki/etcd/ca.crt"
ETCD_CERT="/etc/kubernetes/pki/etcd/server.crt"
ETCD_KEY="/etc/kubernetes/pki/etcd/server.key"
HOST_1=https://xxx.xxx.xxx.xxx:2379
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" --endpoints="${HOST_1}" endpoint status --write-out=table
# key‑value operations
put foo bar
get foo
del foo
# transaction example
txn <<'EOF'
mod("key1") > "0"
put key1 "overwrote-key1"
put key2 "some extra key"
EOF
# cluster maintenance
member list
endpoint health
alarm list
defrag
snapshot save snapshot.db
snapshot restore
</code>Etcd Monitoring
Important metrics are grouped by health status, system (USE) and application (RED) dimensions, covering CPU, memory, disk usage, request rate, error rate, latency, WAL/DB fsync duration, leader changes, and watcher count.
Metrics can be collected with
kube‑prometheusin either HTTP or HTTPS mode. Example Helm command for HTTP mode:
<code>helm install monitoring -n cattle-prometheus \
--set kubeEtcd.service.port=2381 \
--set kubeEtcd.service.targetPort=2381 \
--set prometheusOperator.admissionWebhooks.patch.image.sha=null .
</code>For HTTPS mode, create a secret with etcd certificates and add the corresponding Helm values.
Grafana dashboards are available (links to JSON files) and can be imported directly.
Benchmarking Etcd
SLI/SLO definitions focus on throughput (e.g., 40 k reads / s, 20 k writes / s) and latency (99 % of requests under 100 ms). The official etcd benchmark tool is built from the etcd source.
<code># Install benchmark
git clone https://github.com/etcd-io/etcd.git --depth 1
cd etcd
go install -v ./tools/benchmark
# Example write benchmark against the leader
./benchmark --endpoints=${HOST_2} --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
--target-leader --conns=100 --clients=1000 put --key-size=8 --sequential-keys --total=100000 --val-size=256
</code>Sample results show average write QPS of ~14 k with 134 ms 99 % latency for leader‑only writes, and higher QPS when all members are targeted.
Read benchmarks demonstrate linearizable reads achieving ~500 QPS with 7 ms latency, while serializable reads reach >1 k QPS with sub‑2 ms latency.
Disk Performance Testing
Etcd performance is bounded by network round‑trip time and disk persistence latency. Use
pingor
tcpdumpto measure network latency, and
fioto simulate etcd write patterns. The metric
etcd_disk_wal_fsync_duration_secondsshould have 99 % samples below 10 ms.
Tuning Recommendations
Use SSDs and set highest I/O priority:
sudo ionice -c2 -n0 -p $(pgrep etcd)Set CPU governor to
performancefor all cores.
Enable automatic compaction, increase Raft message size, and adjust max storage capacity.
References include the etcd official documentation, GeekTime course, Datadog integration, and community blogs.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.