Databases 20 min read

How etcd Manages Key Revisions and Watches: A Deep Dive into Its Storage Engine

This article explains how etcd stores and retrieves multiple revisions of a key using a B‑tree index, details the read and write paths, describes index restoration on startup, and shows how the watch subsystem delivers change events to clients, all illustrated with Go code snippets.

Architecture Talk
Architecture Talk
Architecture Talk
How etcd Manages Key Revisions and Watches: A Deep Dive into Its Storage Engine

Index

etcd assigns a revision to every key‑value change; each change is recorded in BoltDB. To retrieve a value, the corresponding revision must first be obtained, which is managed by the in‑memory B‑tree index.

The B‑tree stores revision information for all keys. The Get method of the index retrieves the revision for a given key:

func (ti *treeIndex) Get(key []byte, atRev int64) (modified, created revision, ver int64, err error) { ... }

Internally it looks up a keyIndex in the B‑tree and returns the stored revision.

keyIndex

A keyIndex holds the current value, the latest modification revision, and a list of generation objects that record the full lifecycle of the key. When a key is deleted, a tombstone generation is appended.

func (ki *keyIndex) get(lg *zap.Logger, atRev int64) (modified, created revision, ver int64, err error) { ... }

Read Operations

All query requests eventually call rangeKeys, which uses the B‑tree to collect the relevant keyIndex entries and then fetches the actual data from BoltDB.

func (tr *storeTxnRead) rangeKeys(key, end []byte, curRev int64, ro RangeOptions) (*RangeResult, error) { ... }

The Revisions method walks the B‑tree to gather all revisions for a key range:

func (ti *treeIndex) Revisions(key, end []byte, atRev int64) (revs []revision) { ... }

Write Operations

Inserting data creates or updates a keyIndex and adds a new revision:

func (ti *treeIndex) Put(key []byte, rev revision) { ... }

The put method updates the current generation, increments the version counter, and records the new revision:

func (ki *keyIndex) put(lg *zap.Logger, main, sub int64) { ... }

Deletion creates a tombstone generation and stores the deletion revision:

func (tw *storeTxnWrite) delete(key []byte) { ... }

Index Restoration

On startup, etcd rebuilds the in‑memory B‑tree from BoltDB by iterating over all stored keys and feeding them through a channel to a goroutine that reconstructs the index. func (s *store) restore() error { ... } The restoration pipeline consists of restoreChunk (produces key‑value pairs) and restoreIntoIndex (consumes them and inserts into the B‑tree):

func restoreChunk(lg *zap.Logger, kvc chan<- revKeyValue, keys, vals [][]byte, keyToLease map[string]lease.LeaseID) { ... }
func restoreIntoIndex(lg *zap.Logger, idx index) (chan<- revKeyValue, <-chan int64) { ... }

The channel decouples the BoltDB scan from the index reconstruction, allowing concurrent processing.

Watch Functionality

etcd exposes a watchableStore that registers, manages, and triggers watchers. It maintains two groups: unsynced (watchers waiting for new events) and synced (already up‑to‑date).

type watchableStore struct { *store; mu sync.RWMutex; unsynced watcherGroup; synced watcherGroup; ... }

The syncWatchers loop periodically selects a batch of unsynced watchers, reads the corresponding changes from BoltDB, packages them into events, and sends them to each watcher’s channel:

func (s *watchableStore) syncWatchers() int { ... }

Clients interact via a gRPC watchServer. For each client request, two goroutines are spawned: one reads incoming watch requests ( recvLoop) and registers watchers; the other sends generated events back to the client ( sendLoop).

func (ws *watchServer) Watch(stream pb.Watch_WatchServer) error { ... }

The send loop converts internal WatchResponse objects into protobuf messages and streams them to the client.

Applications

Because etcd provides strong consistency and a simple deployment model, it is widely used for service discovery, configuration management, distributed locks, and other coordination tasks in micro‑service architectures.

Summary

etcd’s implementation offers valuable lessons in Go programming, concurrency patterns, and distributed system design, making it a worthwhile reference for engineers building reliable infrastructure.

Goetcdkey-value storewatchrevision
Architecture Talk
Written by

Architecture Talk

Rooted in the "Dao" of architecture, we provide pragmatic, implementation‑focused architecture content.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.