Operations 11 min read

Why Upgrading EBS Volumes Boosted etcd Write Performance—and What Still Limits It

This article details how upgrading AWS EBS volumes from gp2 to GP3 and adjusting instance types improved etcd cluster write throughput, analyzes IOPS bottlenecks using iostat and fio, and explains why further IOPS gains remain constrained by storage and OS caching.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Why Upgrading EBS Volumes Boosted etcd Write Performance—and What Still Limits It

The final solution was simple: upgrade the existing volumes to support higher IOPS, but the troubleshooting process is worth sharing.

Test Environment

Our team manages about 30 self‑built Kubernetes clusters and needed to analyze the performance of their etcd clusters. Each etcd cluster has five members running on m6i.xlarge instances (max 6000 IOPS). Every member uses three volumes:

root volume

write‑ahead‑log (WAL) volume

database volume

All volumes are gp2 , 300 GB, with a maximum of 900 IOPS.

Testing Write Performance

First, on a separate instance, run etcdctl check perf to simulate etcd load. The --load flag accepts s (small), m (medium), l (large), and xl (extra‑large).

With s load the test passes. With l load it fails, showing the cluster can sustain about 6.6 K writes/s, placing it between a medium and a large cluster.

Using iostat we see the WAL volume ( nvme1n1) at 100 % I/O utilization, causing etcd threads to wait.

Running fio to measure fdatasync latency yields a 99th‑percentile of 2671 µs (2.7 ms), well below the etcd recommendation of 10 ms.

fio --rw=write --ioengine=sync --fdatasync=1 --directory=benchmark \
    --size=22m --bs=2300 --name=sandbox

The output shows:

sync (usec): min=476, max=10320, avg=1422.54, stdev=727.83
... 
nvme1n1: ios=0/21315, merge=0/11364, ticks=0/13865, in_queue=13865, util=99.40%

Upgrading to GP3

We upgraded the volumes to GP3 , which guarantees a minimum of 3000 IOPS.

Jobs: 1 (f=1): [W(1)][100.0%][w=2482KiB/s][w=1105 IOPS][eta 00m:00s]
... 
sync (usec): min=327, max=5087, avg=700.24, stdev=240.46

IOPS rose to 1105, but the bottleneck remained the EBS volume.

We then pushed the instance type to its maximum IOPS (~6000). The IOPS increased only about 30 % (from 3000 to 6000), showing diminishing returns.

Jobs: 1 (f=1): [W(1)][100.0%][w=2535KiB/s][w=1129 IOPS][eta 00m:00s]
... 
sync (usec): min=370, max=3924, avg=611.54, stdev=126.78

Where Did the IOPS Go?

Operating systems cache writes; data stays in cache until flushed to disk. For databases, durability requires explicit fdatasync after each write, which etcd does.

Transaction‑sensitive applications need low I/O latency and benefit from SSDs; increasing IOPS can raise latency. Throughput‑sensitive applications tolerate higher latency and benefit from HDDs.

etcd calls fdatasync after each write to guarantee persistence, which adds noticeable latency.

fsync() flushes all modified data and metadata to the storage device and blocks until the device acknowledges. fdatasync() flushes only the data (not metadata) unless needed for correctness.

Maximum Synchronous Write Performance by Volume Type

The etcd IOPS are influenced both by its implementation and the underlying storage limits.

Appendix

Using fio to Test Etcd Storage Performance etcd exposes the wal_fsync_duration_seconds metric; 99 % of values should be under 10 ms. The following fio command mimics etcd’s write pattern: <code>fio --rw=write --ioengine=sync --fdatasync=1 \ --directory=test-data --size=22m --bs=2300 --name=mytest</code> The 99th‑percentile of sync should be below 10 ms (e.g., 2180 µs) to consider the storage fast enough. Adjust --size and --bs for your scenario. Other writes besides WAL may affect the metric. fio version must be ≥ 3.5 to support --fdatasync . Etcd WALs etcd writes operations to a write‑ahead log (WAL) before applying them. To guarantee durability, etcd must persist WAL entries using fdatasync after each write call. Using fio to Access Storage To emulate etcd’s WAL writes, fio must use sequential writes ( --rw=write ), the synchronous I/O engine ( --ioengine=sync ), and enforce fdatasync after each write ( --fdatasync=1 ). The --size and --bs parameters should match the real workload.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance TestingAWSetcdfioIOPSEBSGP3
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.