Databases 21 min read

Inside Dragonboat’s LogDB: Unified Storage Architecture and Go Optimizations

This article provides a detailed walkthrough of Dragonboat's LogDB storage layer, covering its overall architecture, unified key‑value design, memory‑reuse mechanisms with IContext, the IKVStore abstraction, Pebble initialization, event‑driven busy detection, and the sharded DB implementation, all illustrated with code snippets and diagrams.

DeWu Technology
DeWu Technology
DeWu Technology
Inside Dragonboat’s LogDB: Unified Storage Architecture and Go Optimizations

Project Overview

Dragonboat is a pure Go implementation of a multi‑group Raft library that hides Raft complexity behind a simple NodeHost and state‑machine interface. The article walks through the stable V3 code base, focusing on the LogDB persistence layer.

Overall Architecture

The storage subsystem is built around a unified LogDB module that manages all Raft‑related data. The diagram below shows the high‑level components and their interactions.

Overall architecture diagram
Overall architecture diagram

LogDB Unified Storage

LogDB is the core persistence layer of Dragonboat. Although the name contains “Log”, it stores every piece of data required by the Raft protocol, including state, log entries, snapshots, messages, and bootstrap configuration.

Index Keys

All keys are stored in a KV engine (Pebble or RocksDB). To keep the key space tidy, a 2‑byte header distinguishes different business categories. The following prefixes are defined:

entryKeyHeader = []byte{0x1,0x1} // normal log entry
persistentStateKey = []byte{0x2,0x2} // Raft state
maxIndexKey = []byte{0x3,0x3} // maximum index record
nodeInfoKey = []byte{0x4,0x4} // node metadata
bootstrapKey = []byte{0x5,0x5} // bootstrap config
snapshotKey = []byte{0x6,0x6} // snapshot index
entryBatchKey = []byte{0x7,0x7} // batch log entries

Key generation reuses a single byte slice (type Key) to avoid allocations:

type Key struct {
    data []byte // pooled byte array
    key  []byte // slice pointing to the active region
    pool *sync.Pool
}

func (k *Key) useAsEntryKey() { k.key = k.data }

func (k *Key) SetEntryKey(clusterID uint64, nodeID uint64, index uint64) {
    k.useAsEntryKey()
    k.key[0] = entryKeyHeader[0]
    k.key[1] = entryKeyHeader[1]
    binary.BigEndian.PutUint64(k.key[4:], clusterID)
    binary.BigEndian.PutUint64(k.key[12:], nodeID)
    binary.BigEndian.PutUint64(k.key[20:], index)
}

Variable Reuse with IContext

IContext provides a per‑thread context that recycles keys, byte buffers, and batch objects, dramatically reducing GC pressure in high‑concurrency scenarios.

Key objects are obtained via GetKey() and returned to the pool after use.

Byte buffers are fetched with GetValueBuffer(sz).

Write and entry batches are reused through GetWriteBatch() and GetEntryBatch().

type IContext interface {
    Destroy()
    Reset()
    GetKey() IReusableKey
    GetValueBuffer(sz uint64) []byte
    GetWriteBatch() interface{}
    SetWriteBatch(wb interface{})
    GetEntryBatch() pb.EntryBatch
    GetLastEntryBatch() pb.EntryBatch
}

Storage Engine Wrapper IKVStore

IKVStore abstracts the underlying KV engine, allowing Dragonboat to plug in Pebble, RocksDB, or any compatible store.

type IKVStore interface {
    Name() string
    Close() error
    IterateValue(fk []byte, lk []byte, inc bool, op func(key []byte, data []byte) (bool, error)) error
    GetValue(key []byte, op func([]byte) error) error
    SaveValue(key []byte, value []byte) error
    DeleteValue(key []byte) error
    GetWriteBatch() IWriteBatch
    CommitWriteBatch(wb IWriteBatch) error
    BulkRemoveEntries(firstKey []byte, lastKey []byte) error
    CompactEntries(firstKey []byte, lastKey []byte) error
    FullCompaction() error
}

type IWriteBatch interface {
    Destroy()
    Put(key, value []byte)
    Delete(key []byte)
    Clear()
    Count() int
}

openPebbleDB Initialization

The function openPebbleDB translates a LogDBConfig into Pebble options, creates caches, configures LSM‑tree levels, and finally opens the database.

func openPebbleDB(cfg config.LogDBConfig, cb kv.LogDBCallback, dir string, wal string, fs vfs.IFS) (kv.IKVStore, error) {
    blockSz := int(cfg.KVBlockSize)
    writeBufSz := int(cfg.KVWriteBufferSize)
    // … read other tuning parameters …
    levelOpts := []pebble.LevelOptions{}
    sz := cfg.KVTargetFileSizeBase
    for lvl := 0; lvl < int(cfg.KVNumOfLevels); lvl++ {
        levelOpts = append(levelOpts, pebble.LevelOptions{Compression: pebble.NoCompression, BlockSize: blockSz, TargetFileSize: sz})
        sz *= cfg.KVTargetFileSizeMultiplier
    }
    cache := pebble.NewCache(int64(cfg.KVLRUCacheSize))
    ro := &pebble.IterOptions{}
    wo := &pebble.WriteOptions{Sync: true}
    opts := &pebble.Options{Levels: levelOpts, Cache: cache, MemTableSize: writeBufSz, FS: vfs.NewPebbleFS(fs)}
    // WAL directory handling
    if wal != "" { fs.MkdirAll(wal); opts.WALDir = wal }
    fs.MkdirAll(dir)
    pdb, err := pebble.Open(dir, opts)
    if err != nil { return nil, err }
    kv := &KV{db: pdb, callback: cb, config: cfg, opts: opts, ro: ro, wo: wo}
    // set event listener for busy detection
    kv.eventListener = pebble.EventListener{WALCreated: kv.onWALCreated, FlushEnd: kv.onFlushEnd, CompactionEnd: kv.onCompactionEnd}
    cache.Unref()
    return kv, nil
}

Event Listener and Busy Detection

The eventListener notifies the upper layer when either the mem‑table size exceeds 95 % of the configured buffer or the number of L0 files exceeds the stop‑writes threshold.

func (l *eventListener) notify() {
    l.stopper.RunWorker(func() {
        select {
        case <-l.kv.dbSet:
            memSizeThreshold := l.kv.config.KVWriteBufferSize * l.kv.config.KVMaxWriteBufferNumber * 19 / 20
            l0FileNumThreshold := l.kv.config.KVLevel0StopWritesTrigger - 1
            m := l.kv.db.Metrics()
            busy := m.MemTable.Size >= memSizeThreshold || uint64(m.Levels[0].NumFiles) >= l0FileNumThreshold
            l.kv.callback(busy)
        default:
        }
    })
}

ShardedDB – Multi‑Shard LogDB

For workloads with many Raft clusters, ShardedDB manages a slice of db instances, each backed by its own Pebble bucket. It distributes updates to the appropriate shard based on a partitioner.

type ShardedDB struct {
    completedCompactions uint64
    config               config.LogDBConfig
    ctxs                 []IContext
    shards               []*db
    partitioner          server.IPartitioner
    compactionCh         chan struct{}
    compactions          *compactions
    stopper              *syncutil.Stopper
}

func (s *ShardedDB) SaveRaftState(updates []pb.Update, shardID uint64) error {
    if shardID-1 >= uint64(len(s.ctxs)) { plog.Panicf("invalid shardID %d", shardID) }
    ctx := s.ctxs[shardID-1]
    ctx.Reset()
    return s.SaveRaftStateCtx(updates, ctx)
}

Summary

LogDB provides a clean, high‑performance storage abstraction for Dragonboat, showcasing Go techniques such as extensive memory reuse, pluggable KV back‑ends, and careful LSM‑tree tuning. The module is a valuable reference for anyone building robust distributed storage or consensus systems.

Storage EngineRaftkey-value storeLogDBPebble
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.