Databases 14 min read

How etcd’s Fully Concurrent Read Boosts Kubernetes Performance

This article reviews the evolution of etcd’s read‑write mechanisms, explains the “Fully Concurrent Read” feature introduced in etcd 3.4, and presents experimental results showing how it dramatically reduces expensive read latency and improves overall throughput in Kubernetes clusters.

Alibaba Cloud Native

Sep 19, 2019

How etcd’s Fully Concurrent Read Boosts Kubernetes Performance

Background

etcd stores Kubernetes metadata and provides strong consistency via the Raft algorithm. Read‑write latency directly influences cluster response time. In addition to normal traffic, scenarios such as API server restarts or cache‑bypassing queries generate a large number of “expensive” reads (range requests over many keys or large values). When many clients retry, these reads can overload etcd and even cause crashes.

etcd Read‑Write Evolution

Early versions (etcd v3.0 and earlier)

All reads and writes passed through the Raft consensus path to guarantee linearizable reads. Because reads dominate request volume, this design caused poor read performance.

etcd v3.1 – ReadIndex optimization

ReadIndex records the current commit index at the start of a read; the read returns once the state machine’s apply index reaches that commit index. This allows members to serve reads without a full Raft round‑trip.

Leader processes ReadIndex requests; followers forward them to the leader.

The leader must still be the current leader, verified by a quorum broadcast.

etcd v3.2 – BoltDB lock refinement and buffering

Even after v3.1, the underlying BoltDB layer used a coarse‑grained transaction lock that serialized reads and writes. The lock was acquired for every read or write transaction:

func (s *store) TxnBegin() int64 { 
    ...
    s.tx = s.b.BatchTx() // boltDB transaction lock
    s.tx.Lock()
    ...
}

v3.2 introduced two key improvements:

Replaced the single lock with a read‑write lock, enabling “N reads or 1 write” parallelism.

Added per‑transaction buffers (readTx buffer and batchTx buffer). Reads first consult the buffer; writes update both buffers and write‑back to the read buffer after a batch commit.

type concurrentReadTx struct { buf txReadBuffer ... }
func (b *backend) ConcurrentReadTx() ReadTx { b.readTx.RLock(); defer b.readTx.RUnlock(); ... }
func (rt *concurrentReadTx) RLock() {} // no blocking
func (s *store) Read() TxnRead { tx := s.b.ConcurrentReadTx(); tx.RLock(); ... }

Fully Concurrent Read in etcd 3.4

etcd 3.4 removes the shared read‑write lock entirely (or further refines it) and creates a fresh concurrentReadTx for each read request. The new instance copies the buffer from the persistent readTx; the original readTx is closed only after all transactions finish. This eliminates lock‑induced blocking between reads and writes.

Each batch interval creates a new concurrentReadTx with its own buffer, leaving the old readTx untouched.

The copy operation adds only a small overhead, which is negligible in practice.

type concurrentReadTx struct { buf txReadBuffer ... }
func (b *backend) ConcurrentReadTx() ReadTx { b.readTx.RLock(); defer b.readTx.RUnlock(); ... }
func (rt *concurrentReadTx) RLock() {} // no operation

Experimental Validation

Single‑node etcd tests compared version 3.3 (without Fully Concurrent Read) and version 3.4 (with the feature). The workload pre‑loaded 100 k key‑value pairs (128 B key, 1‑32 KB value, avg 16 KB) and then executed:

Expensive read: range 20 k keys, 1 op/s.

Cheap read: range 10 keys, 100 op/s.

Write: put 1 key, 20 op/s.

Two scenarios were measured: (1) cheap‑read + write, and (2) cheap‑read + expensive‑read + write. Results are reported as 99th‑percentile (p99) latency:

Version 3.3 – p99 cheap read: 14.1 ms , p99 write: 15.1 ms .

Version 3.4 (Fully Concurrent Read) – p99 cheap read: 16.1 ms , p99 write: 14.2 ms .

In the heavy‑read scenario, etcd 3.4 reduced expensive‑read response time by roughly 85 % and write latency by about 80 %. Large‑scale tests on a 5 000‑node cluster showed a 97.4 % reduction in p99 write latency under high read pressure.

Conclusion

The Fully Concurrent Read feature in etcd 3.4 eliminates the read‑write lock bottleneck, dramatically lowering latency for both expensive reads and writes while increasing overall throughput. The change is stable in production and represents a significant step forward for etcd performance under heavy read workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Distributed storage Raft Etcd BoltDB Fully Concurrent Read

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

etcd Read‑Write Evolution

Early versions (etcd v3.0 and earlier)

etcd v3.1 – ReadIndex optimization

etcd v3.2 – BoltDB lock refinement and buffering

Fully Concurrent Read in etcd 3.4

Experimental Validation

Conclusion

Alibaba Cloud Native

How this landed with the community

Was this worth your time?

0 Comments

Early versions (etcd v3.0 and earlier)

etcd v3.1 – ReadIndex optimization

etcd v3.2 – BoltDB lock refinement and buffering

Fully Concurrent Read in etcd 3.4