Why Upgrading EBS Volumes Boosted etcd Write Performance—and What Still Limits It
This article details how upgrading AWS EBS volumes from gp2 to GP3 and adjusting instance types improved etcd cluster write throughput, analyzes IOPS bottlenecks using iostat and fio, and explains why further IOPS gains remain constrained by storage and OS caching.
The final solution was simple: upgrade the existing volumes to support higher IOPS, but the troubleshooting process is worth sharing.
Test Environment
Our team manages about 30 self‑built Kubernetes clusters and needed to analyze the performance of their etcd clusters. Each etcd cluster has five members running on m6i.xlarge instances (max 6000 IOPS). Every member uses three volumes:
root volume
write‑ahead‑log (WAL) volume
database volume
All volumes are gp2 , 300 GB, with a maximum of 900 IOPS.
Testing Write Performance
First, on a separate instance, run etcdctl check perf to simulate etcd load. The --load flag accepts s (small), m (medium), l (large), and xl (extra‑large).
With s load the test passes. With l load it fails, showing the cluster can sustain about 6.6 K writes/s, placing it between a medium and a large cluster.
Using iostat we see the WAL volume ( nvme1n1) at 100 % I/O utilization, causing etcd threads to wait.
Running fio to measure fdatasync latency yields a 99th‑percentile of 2671 µs (2.7 ms), well below the etcd recommendation of 10 ms.
fio --rw=write --ioengine=sync --fdatasync=1 --directory=benchmark \
--size=22m --bs=2300 --name=sandboxThe output shows:
sync (usec): min=476, max=10320, avg=1422.54, stdev=727.83
...
nvme1n1: ios=0/21315, merge=0/11364, ticks=0/13865, in_queue=13865, util=99.40%Upgrading to GP3
We upgraded the volumes to GP3 , which guarantees a minimum of 3000 IOPS.
Jobs: 1 (f=1): [W(1)][100.0%][w=2482KiB/s][w=1105 IOPS][eta 00m:00s]
...
sync (usec): min=327, max=5087, avg=700.24, stdev=240.46IOPS rose to 1105, but the bottleneck remained the EBS volume.
We then pushed the instance type to its maximum IOPS (~6000). The IOPS increased only about 30 % (from 3000 to 6000), showing diminishing returns.
Jobs: 1 (f=1): [W(1)][100.0%][w=2535KiB/s][w=1129 IOPS][eta 00m:00s]
...
sync (usec): min=370, max=3924, avg=611.54, stdev=126.78Where Did the IOPS Go?
Operating systems cache writes; data stays in cache until flushed to disk. For databases, durability requires explicit fdatasync after each write, which etcd does.
Transaction‑sensitive applications need low I/O latency and benefit from SSDs; increasing IOPS can raise latency. Throughput‑sensitive applications tolerate higher latency and benefit from HDDs.
etcd calls fdatasync after each write to guarantee persistence, which adds noticeable latency.
fsync() flushes all modified data and metadata to the storage device and blocks until the device acknowledges. fdatasync() flushes only the data (not metadata) unless needed for correctness.
Maximum Synchronous Write Performance by Volume Type
The etcd IOPS are influenced both by its implementation and the underlying storage limits.
Appendix
Using fio to Test Etcd Storage Performance etcd exposes the wal_fsync_duration_seconds metric; 99 % of values should be under 10 ms. The following fio command mimics etcd’s write pattern: <code>fio --rw=write --ioengine=sync --fdatasync=1 \ --directory=test-data --size=22m --bs=2300 --name=mytest</code> The 99th‑percentile of sync should be below 10 ms (e.g., 2180 µs) to consider the storage fast enough. Adjust --size and --bs for your scenario. Other writes besides WAL may affect the metric. fio version must be ≥ 3.5 to support --fdatasync . Etcd WALs etcd writes operations to a write‑ahead log (WAL) before applying them. To guarantee durability, etcd must persist WAL entries using fdatasync after each write call. Using fio to Access Storage To emulate etcd’s WAL writes, fio must use sequential writes ( --rw=write ), the synchronous I/O engine ( --ioengine=sync ), and enforce fdatasync after each write ( --fdatasync=1 ). The --size and --bs parameters should match the real workload.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
