Cloud Native 17 min read

Mastering etcd: Architecture, Monitoring & Performance Tuning

This article provides a comprehensive overview of etcd—including its origins, role in Kubernetes, version evolution, layered architecture, key terminology, operational commands, monitoring metrics, benchmarking procedures, disk‑performance testing, and tuning recommendations—for building reliable cloud‑native clusters.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Mastering etcd: Architecture, Monitoring & Performance Tuning

Etcd Overview

Etcd is an open‑source, highly available distributed key‑value store launched by the CoreOS team in June 2013. It implements the Raft consensus algorithm and is written in Go.

Why Kubernetes Uses Etcd

Kubernetes adopted etcd early because its Go implementation, high availability, watch mechanism, CAS, and TTL features match the control plane’s needs. The early 0.4 release used etcd v0.2, and later versions rely on etcd v2 and v3 features such as watch, leader election, and automatic key expiration.

Etcd Version Evolution

Key features of etcd v1 and v2 are summarized in a timeline (image). As Kubernetes grew, performance and stability limits of v2 became apparent, leading to the release of etcd 3.0 in June 2016 and its default use in Kubernetes 1.6, enabling clusters of up to 5 000 nodes.

Architecture

Etcd is layered into Client, API network, Raft algorithm, logical, and storage layers.

Client layer : v2 and v3 API client libraries with load balancing and automatic failover.

API network layer : client‑to‑server communication (v2 uses HTTP/1.x, v3 uses gRPC) and inter‑node Raft communication.

Raft algorithm layer : leader election, log replication, and read‑index for strong consistency.

Logical layer : core modules such as KVServer, MVCC, authentication, lease, and compaction.

Storage layer : WAL, snapshot, and BoltDB modules that persist cluster metadata and user data.

Key Terminology

Raft – consensus algorithm.

Node – a Raft state‑machine instance.

Member – an etcd instance that serves client requests.

Cluster – a set of members working together.

Peer – another member in the same cluster.

WAL – write‑ahead log.

Snapshot – point‑in‑time copy of the data.

Leader, Follower, Candidate, Term, Index – Raft concepts.

Operational Practices

Common etcdctl Commands

<code>ETCD_CA_CERT="/etc/kubernetes/pki/etcd/ca.crt"
ETCD_CERT="/etc/kubernetes/pki/etcd/server.crt"
ETCD_KEY="/etc/kubernetes/pki/etcd/server.key"
HOST_1=https://xxx.xxx.xxx.xxx:2379
ETCDCTL_API=3 etcdctl --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" --endpoints="${HOST_1}" endpoint status --write-out=table
# key‑value operations
put foo bar
get foo
del foo
# transaction example
txn <<'EOF'
mod("key1") > "0"
put key1 "overwrote-key1"
put key2 "some extra key"
EOF
# cluster maintenance
member list
endpoint health
alarm list
defrag
snapshot save snapshot.db
snapshot restore
</code>

Etcd Monitoring

Important metrics are grouped by health status, system (USE) and application (RED) dimensions, covering CPU, memory, disk usage, request rate, error rate, latency, WAL/DB fsync duration, leader changes, and watcher count.

Metrics can be collected with

kube‑prometheus

in either HTTP or HTTPS mode. Example Helm command for HTTP mode:

<code>helm install monitoring -n cattle-prometheus \
  --set kubeEtcd.service.port=2381 \
  --set kubeEtcd.service.targetPort=2381 \
  --set prometheusOperator.admissionWebhooks.patch.image.sha=null .
</code>

For HTTPS mode, create a secret with etcd certificates and add the corresponding Helm values.

Grafana dashboards are available (links to JSON files) and can be imported directly.

Benchmarking Etcd

SLI/SLO definitions focus on throughput (e.g., 40 k reads / s, 20 k writes / s) and latency (99 % of requests under 100 ms). The official etcd benchmark tool is built from the etcd source.

<code># Install benchmark
git clone https://github.com/etcd-io/etcd.git --depth 1
cd etcd
go install -v ./tools/benchmark

# Example write benchmark against the leader
./benchmark --endpoints=${HOST_2} --cacert="${ETCD_CA_CERT}" --cert="${ETCD_CERT}" --key="${ETCD_KEY}" \
  --target-leader --conns=100 --clients=1000 put --key-size=8 --sequential-keys --total=100000 --val-size=256
</code>

Sample results show average write QPS of ~14 k with 134 ms 99 % latency for leader‑only writes, and higher QPS when all members are targeted.

Read benchmarks demonstrate linearizable reads achieving ~500 QPS with 7 ms latency, while serializable reads reach >1 k QPS with sub‑2 ms latency.

Disk Performance Testing

Etcd performance is bounded by network round‑trip time and disk persistence latency. Use

ping

or

tcpdump

to measure network latency, and

fio

to simulate etcd write patterns. The metric

etcd_disk_wal_fsync_duration_seconds

should have 99 % samples below 10 ms.

Tuning Recommendations

Use SSDs and set highest I/O priority:

sudo ionice -c2 -n0 -p $(pgrep etcd)

Set CPU governor to

performance

for all cores.

Enable automatic compaction, increase Raft message size, and adjust max storage capacity.

References include the etcd official documentation, GeekTime course, Datadog integration, and community blogs.

Monitoringcloud nativebenchmarkdistributed storageetcd
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.