How Upgrading EBS Volumes Boosted etcd Write Performance by 30%
A technical deep‑dive shows how a team managing dozens of Kubernetes clusters diagnosed a write‑ahead‑log bottleneck in etcd, measured IOPS and latency with etcdctl and fio, upgraded gp2 volumes to gp3, and discovered diminishing returns beyond 3000 IOPS while explaining the role of fdatasync in storage performance.
The team operates about 30 self‑built Kubernetes clusters, each with an etcd cluster of five members running on m6i.xlarge instances (max 6000 IOPS). Every member uses three gp2 volumes (root, write‑ahead‑log, and data) sized at 300 GB, each limited to 900 IOPS.
Initial write‑performance testing was done with etcdctl check perf using the --load flag (s = small, m = medium, l = large, xl = xLarge). The small load passed, but the large load failed, revealing a write throughput of roughly 6.6 K ops/s, placing the cluster between medium‑sized and large‑sized deployments.
Running iostat showed that the write‑ahead‑log volume ( nvme1n1) was at 100 % I/O utilization, causing etcd threads to wait for I/O completion.
Using fio with --fdatasync=1 to emulate etcd’s WAL sync behavior, the 99th‑percentile fdatasync latency was measured at 2671 µs (≈2.7 ms), well under the etcd‑recommended 10 ms threshold. The reported IOPS was 709, lower than the gp2‑claimed 900 IOPS but still acceptable.
Upgrade to GP3
The volumes were upgraded to gp3, which guarantees a minimum of 3000 IOPS. After the upgrade, fio reported an IOPS of 1105, an improvement but still far from the theoretical maximum. A further attempt to increase the volume’s IOPS specification to 6000 yielded only a marginal rise to 1129 IOPS, indicating that the storage subsystem, not the volume limit, was the bottleneck.
Where Did the IOPS Go?
Operating‑system write caching can mask true disk performance; data remains in cache until it is flushed to persistent storage. etcd relies on fdatasync (instead of plain write) after each transaction to guarantee durability of its write‑ahead log. AWS documentation notes that transaction‑sensitive workloads benefit from low queue depth and appropriate IOPS, while increasing IOPS beyond a point can raise latency.
Transaction‑sensitive applications are I/O‑latency‑sensitive and benefit from SSD volumes; maintaining low queue length and suitable IOPS keeps latency low. Continuously increasing volume IOPS can increase I/O latency. Throughput‑sensitive applications are less latency‑sensitive and benefit from HDD volumes; maintaining high queue length during large sequential I/O ensures high throughput.
The fdatasync system call forces the kernel to flush modified data to the underlying device, which explains the noticeable latency impact observed in the tests.
The following image (omitted here) shows the maximum performance of different EBS volume types, highlighting that etcd’s maximum synchronous write speed is constrained by the underlying storage.
Appendix: Using fio to Test etcd Storage Performance
etcd exposes the wal_fsync_duration_seconds Prometheus metric; the 99th‑percentile should stay below 10 ms for acceptable storage performance. The following fio command reproduces the write pattern of etcd’s WAL:
fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-data --size=22m --bs=2300 --name=mytestKey points when running the test:
Adjust --size and --bs to match the workload.
Be aware that other write operations besides WAL writes may affect the 99th‑percentile measurement.
Use fio version 3.5 or newer; older versions lack --fdatasync support.
Sample fio output (truncated) shows the 99th‑percentile fdatasync latency and IOPS values, which can be compared against the 10 ms guideline.
Jobs: 1 (f=1): [W(1)][100.0%][w=22.5MiB/s][w=1129 IOPS]
sync (usec): min=370, max=3924, avg=611.54, stdev=126.78
| 1.00th=[ 420] 5.00th=[ 453] 10.00th=[ 474] 20.00th=[ 506]
| 30.00th=[ 537] 40.00th=[ 562] 50.00th=[ 594] 60.00th=[ 635]
| 70.00th=[ 676] 80.00th=[ 717] 90.00th=[ 734] 95.00th=[ 807]
| 99.00th=[ 963] 99.50th=[ 1057] 99.90th=[ 1254] 99.95th=[ 1336]
| 99.99th=[ 2900]
disk stats (read/write): nvme2n1: ios=5628/10328, merge=0/29, ticks=2535/7153, in_queue=9688, util=99.09%These results confirm that while upgrading to higher‑IOPS gp3 volumes can improve performance, the actual gains are limited by the storage stack and the need for synchronous fsync/fdatasync calls in etcd.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
