Databases 11 min read

Cut Storage Costs 400%: Inside BitalosDB’s High‑Performance KV Engine

An in‑depth look at BitalosDB, the home‑grown NoSQL storage engine behind Zuoyebang’s massive KV traffic, covering its novel IO architecture, KV‑separation design, Raft‑based consistency, multi‑cloud CRDT replication, and benchmark results that show up to 400% cost savings versus standard Redis.

Zuoyebang Tech Team
Zuoyebang Tech Team
Zuoyebang Tech Team
Cut Storage Costs 400%: Inside BitalosDB’s High‑Performance KV Engine

Overview

Project background: To support massive cache demand and complex I/O scenarios for online services, the goal is to handle larger traffic and data at lower cost.

Project Status

Handles 90% of Zuoyebang’s KV storage traffic, peak QPS 15 million.

Cache & storage volume: 130 TB.

Average read latency: 0.1 ms; write latency: 0.15 ms.

Availability: 99.9999%.

Project Benefits

Compared with standard Redis, current storage volume saves 400% in cost.

Key Technologies

BitalosDB : a self‑developed storage engine with a new I/O architecture for extreme performance.

Raft Protocol : heavily optimized to boost write performance and data synchronization, with improved election strategy for higher cluster stability.

Multi‑cloud Multi‑master (CRDT) : ensures conflict‑free writes across multiple clouds, achieving eventual consistency.

Redis Compatibility : supports the Redis protocol for seamless migration to Stored.

Storage Panorama

Storage Engine

Problem

Standard LSM‑Tree suffers from read‑write amplification; as data scale grows, resource consumption for amplification increases. The challenge is to support larger write volumes and higher read traffic at lower cost.

Solution

Use Bitalos‑Trees to solve read amplification, Bithash for KV separation to solve write amplification, and separate hot‑cold data to further save memory and disk.

BitalosDB I/O Architecture

Bitalos‑Trees handle data updates and hot data storage, providing high‑performance indexing while eliminating read amplification.

Bithash stores values, delivering high‑performance reads/writes and eliminating write amplification.

Bitable stores cold data; based on data size and access frequency, cold data is moved to Bitable during low‑traffic periods, improving compression and reducing index memory usage.

KV Separation – Technical Analysis

Option A

Option B

Option C

Analysis

Summary of Options

Options A & B require extra CPU and I/O for index queries/updates during vlog‑GC.

Option C triggers multiple random reads during vLog reads, leaving room for read performance improvement.

BitalosDB enables closed‑loop GC inside vLog without index queries/updates while maintaining high‑performance vLog reads.

BitalosDB KV‑Separation Technology (Bithash)

File Structure

Data Write

Data Read

Index Write

When a single file’s write volume exceeds the Bithash file capacity threshold, the current Bithash file is closed and the in‑memory index is flushed to disk.

BitalosDB Index Technology (Bitalos‑Tree)

Prefix‑tree based hierarchical B+ tree

Layered Process

Each dashed box represents a B+ tree; for each Trie Layer, the key is sliced by M bytes for indexing. Keys sharing the same first M bytes reside in the same layer.

In extreme cases where all keys are M bytes, they belong to Trie Layer 0. Adding a key longer than 10 × M bytes does not automatically place it in Trie Layer 10; placement depends on the shared prefix.

Performance

Benchmark against RocksDB V7.6.0 (latest at the time).

Machine configuration: Intel Xeon Platinum 8255C CPU @ 2.50 GHz; 2 × 3.5 TB NVMe SSD (RAID 0).

Test settings: Cgroup 8 Core; Concurrency 8; Key 32 B, Value 1 KB (100% random).

Benchmarks on data sizes 25 GB, 50 GB, 100 GB show BitalosDB outperforming RocksDB.

Storage Service

High‑Performance Data Consistency Based on Raft

Deeply optimized standard Raft synchronization: Bitalos‑Server leverages batch processing, full‑async I/O, and parallel transmission, achieving more than threefold write performance improvement over the standard Raft protocol.

Pre‑Election Technique for Raft

Standard election triggers a vote as soon as any follower’s heartbeat times out, which can affect write traffic even if the timeout is caused by transient network jitter.

Bitalos‑Server adds a pre‑election phase: when a follower’s heartbeat times out, it first attempts a pre‑election by contacting other followers to verify the leader’s status before launching a formal election.

Multi‑Cloud Multi‑Master Technology Based on CRDT

Background: Zuoyebang’s services run across multiple clouds; the KV store must provide low write latency and high availability. Multi‑master writes across clouds can cause conflicts that must be resolved.

Requirements: Idempotence (a☆a = a), Commutativity (a☆b = b☆a), Associativity (a☆(b☆c) = (a☆b)☆c).

Solution

Idempotence: Each write log in a single‑cloud cluster gets a Raft‑log‑id; cross‑cloud synchronization follows Raft‑log‑id order, ensuring idempotent updates.

Commutativity & Associativity: Stateless value updates (e.g., set, hset) use LWW‑Register semantics; stateful updates (e.g., incrby) use Counter semantics; collection types (hash, set, zset) use OR‑Set semantics.

Conclusion

As Zuoyebang continues to grow, the demand for highly available and high‑performance NoSQL databases rises. The team pursues extreme performance by innovating I/O architecture and storage algorithms, aiming for higher write/read throughput and lower latency. Further technical details and optimizations will be shared in future articles.

NoSQLKV storagedistributed storageDatabase performanceRaftcrdt
Zuoyebang Tech Team
Written by

Zuoyebang Tech Team

Sharing technical practices from Zuoyebang

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.