Inside Dragonboat’s LogDB: Unified Storage Architecture and Go Optimizations
This article provides a detailed walkthrough of Dragonboat's LogDB storage layer, covering its overall architecture, unified key‑value design, memory‑reuse mechanisms with IContext, the IKVStore abstraction, Pebble initialization, event‑driven busy detection, and the sharded DB implementation, all illustrated with code snippets and diagrams.
Project Overview
Dragonboat is a pure Go implementation of a multi‑group Raft library that hides Raft complexity behind a simple NodeHost and state‑machine interface. The article walks through the stable V3 code base, focusing on the LogDB persistence layer.
Overall Architecture
The storage subsystem is built around a unified LogDB module that manages all Raft‑related data. The diagram below shows the high‑level components and their interactions.
LogDB Unified Storage
LogDB is the core persistence layer of Dragonboat. Although the name contains “Log”, it stores every piece of data required by the Raft protocol, including state, log entries, snapshots, messages, and bootstrap configuration.
Index Keys
All keys are stored in a KV engine (Pebble or RocksDB). To keep the key space tidy, a 2‑byte header distinguishes different business categories. The following prefixes are defined:
entryKeyHeader = []byte{0x1,0x1} // normal log entry persistentStateKey = []byte{0x2,0x2} // Raft state maxIndexKey = []byte{0x3,0x3} // maximum index record nodeInfoKey = []byte{0x4,0x4} // node metadata bootstrapKey = []byte{0x5,0x5} // bootstrap config snapshotKey = []byte{0x6,0x6} // snapshot index entryBatchKey = []byte{0x7,0x7} // batch log entriesKey generation reuses a single byte slice (type Key) to avoid allocations:
type Key struct {
data []byte // pooled byte array
key []byte // slice pointing to the active region
pool *sync.Pool
}
func (k *Key) useAsEntryKey() { k.key = k.data }
func (k *Key) SetEntryKey(clusterID uint64, nodeID uint64, index uint64) {
k.useAsEntryKey()
k.key[0] = entryKeyHeader[0]
k.key[1] = entryKeyHeader[1]
binary.BigEndian.PutUint64(k.key[4:], clusterID)
binary.BigEndian.PutUint64(k.key[12:], nodeID)
binary.BigEndian.PutUint64(k.key[20:], index)
}Variable Reuse with IContext
IContext provides a per‑thread context that recycles keys, byte buffers, and batch objects, dramatically reducing GC pressure in high‑concurrency scenarios.
Key objects are obtained via GetKey() and returned to the pool after use.
Byte buffers are fetched with GetValueBuffer(sz).
Write and entry batches are reused through GetWriteBatch() and GetEntryBatch().
type IContext interface {
Destroy()
Reset()
GetKey() IReusableKey
GetValueBuffer(sz uint64) []byte
GetWriteBatch() interface{}
SetWriteBatch(wb interface{})
GetEntryBatch() pb.EntryBatch
GetLastEntryBatch() pb.EntryBatch
}Storage Engine Wrapper IKVStore
IKVStore abstracts the underlying KV engine, allowing Dragonboat to plug in Pebble, RocksDB, or any compatible store.
type IKVStore interface {
Name() string
Close() error
IterateValue(fk []byte, lk []byte, inc bool, op func(key []byte, data []byte) (bool, error)) error
GetValue(key []byte, op func([]byte) error) error
SaveValue(key []byte, value []byte) error
DeleteValue(key []byte) error
GetWriteBatch() IWriteBatch
CommitWriteBatch(wb IWriteBatch) error
BulkRemoveEntries(firstKey []byte, lastKey []byte) error
CompactEntries(firstKey []byte, lastKey []byte) error
FullCompaction() error
}
type IWriteBatch interface {
Destroy()
Put(key, value []byte)
Delete(key []byte)
Clear()
Count() int
}openPebbleDB Initialization
The function openPebbleDB translates a LogDBConfig into Pebble options, creates caches, configures LSM‑tree levels, and finally opens the database.
func openPebbleDB(cfg config.LogDBConfig, cb kv.LogDBCallback, dir string, wal string, fs vfs.IFS) (kv.IKVStore, error) {
blockSz := int(cfg.KVBlockSize)
writeBufSz := int(cfg.KVWriteBufferSize)
// … read other tuning parameters …
levelOpts := []pebble.LevelOptions{}
sz := cfg.KVTargetFileSizeBase
for lvl := 0; lvl < int(cfg.KVNumOfLevels); lvl++ {
levelOpts = append(levelOpts, pebble.LevelOptions{Compression: pebble.NoCompression, BlockSize: blockSz, TargetFileSize: sz})
sz *= cfg.KVTargetFileSizeMultiplier
}
cache := pebble.NewCache(int64(cfg.KVLRUCacheSize))
ro := &pebble.IterOptions{}
wo := &pebble.WriteOptions{Sync: true}
opts := &pebble.Options{Levels: levelOpts, Cache: cache, MemTableSize: writeBufSz, FS: vfs.NewPebbleFS(fs)}
// WAL directory handling
if wal != "" { fs.MkdirAll(wal); opts.WALDir = wal }
fs.MkdirAll(dir)
pdb, err := pebble.Open(dir, opts)
if err != nil { return nil, err }
kv := &KV{db: pdb, callback: cb, config: cfg, opts: opts, ro: ro, wo: wo}
// set event listener for busy detection
kv.eventListener = pebble.EventListener{WALCreated: kv.onWALCreated, FlushEnd: kv.onFlushEnd, CompactionEnd: kv.onCompactionEnd}
cache.Unref()
return kv, nil
}Event Listener and Busy Detection
The eventListener notifies the upper layer when either the mem‑table size exceeds 95 % of the configured buffer or the number of L0 files exceeds the stop‑writes threshold.
func (l *eventListener) notify() {
l.stopper.RunWorker(func() {
select {
case <-l.kv.dbSet:
memSizeThreshold := l.kv.config.KVWriteBufferSize * l.kv.config.KVMaxWriteBufferNumber * 19 / 20
l0FileNumThreshold := l.kv.config.KVLevel0StopWritesTrigger - 1
m := l.kv.db.Metrics()
busy := m.MemTable.Size >= memSizeThreshold || uint64(m.Levels[0].NumFiles) >= l0FileNumThreshold
l.kv.callback(busy)
default:
}
})
}ShardedDB – Multi‑Shard LogDB
For workloads with many Raft clusters, ShardedDB manages a slice of db instances, each backed by its own Pebble bucket. It distributes updates to the appropriate shard based on a partitioner.
type ShardedDB struct {
completedCompactions uint64
config config.LogDBConfig
ctxs []IContext
shards []*db
partitioner server.IPartitioner
compactionCh chan struct{}
compactions *compactions
stopper *syncutil.Stopper
}
func (s *ShardedDB) SaveRaftState(updates []pb.Update, shardID uint64) error {
if shardID-1 >= uint64(len(s.ctxs)) { plog.Panicf("invalid shardID %d", shardID) }
ctx := s.ctxs[shardID-1]
ctx.Reset()
return s.SaveRaftStateCtx(updates, ctx)
}Summary
LogDB provides a clean, high‑performance storage abstraction for Dragonboat, showcasing Go techniques such as extensive memory reuse, pluggable KV back‑ends, and careful LSM‑tree tuning. The module is a valuable reference for anyone building robust distributed storage or consensus systems.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
