Databases 20 min read

Design and Architecture of CB‑SQL: A Cloud‑Native Distributed Database

The article explains CB‑SQL's cloud‑native distributed database architecture, covering range‑based sharding, metadata management, hybrid logical clocks, pessimistic distributed transactions with SSI, Raft‑based strong consistency, and horizontal scalability for petabyte‑scale workloads.

JD Retail Technology

Aug 29, 2019

Design and Architecture of CB‑SQL: A Cloud‑Native Distributed Database

CB‑SQL is presented as a next‑generation cloud‑native distributed database that can theoretically manage up to 4 EB of data, offering ACID transactions, full SQL support, multi‑region deployment, high concurrency, automatic horizontal scaling, and automatic fault recovery.

To handle massive data volumes, CB‑SQL adopts data sharding. It compares hash, consistent‑hash (with and without virtual nodes), and range‑based algorithms, concluding that range‑based sharding best satisfies elastic scaling, load‑balancing, and scan support.

Metadata for shards (default 64 MB each) is managed in a two‑level hierarchy: a 64 MB in‑memory tier for fast access and a larger tier for overflow, resulting in several gigabytes of memory for petabyte‑scale data.

The system uses a layered architecture: an upper SQL layer sits on a distributed key‑value layer that implements distributed transactions, thereby combining massive storage with relational guarantees.

Clock selection is critical for distributed transactions. The article discusses three schemes—TSO, atomic clocks, and Hybrid Logical Clock (HLC)—and adopts HLC for its decentralized nature, which fits CB‑SQL's cross‑region design. The HLC algorithm is shown below:

Initially l.j := 0; c.j := 0

Send or local event

l'.j := l.j; // backup of known max physical time

l.j := max(l'.j, pt.j); // update with max of backup and current physical time

if (l.j = l'.j) then c.j := c.j + 1 else c.j := 0

Timestamp with l.j, c.j

Receive event of message m

l'.j := l.j;

l.j := max(l'.j, l.m, pt.j);

if (l.j = l'.j = l.m) then c.j := max(c.j, c.m) + 1

elseif (l.j = l'.j) then c.j := c.j + 1

elseif (l.j = l.m) then c.j := c.m + 1

else c.j := 0

Timestamp with l.j, c.j

CB‑SQL implements a pessimistic distributed‑transaction model (locking during execution) and supports Serializable Snapshot Isolation (SSI). Conflict types (Read‑Read, Read‑Write, Write‑Read, Write‑Write) are modeled as directed edges in a conflict graph; cycles indicate non‑serializable schedules, which are broken using timestamp comparison, priority, and HLC.

Strong consistency across replicas is achieved with the Raft consensus algorithm per range. Optimizations include assigning a dedicated Raft group to each 64 MB range, reusing network connections among groups, and using leader leases to avoid round‑trip reads while preserving linearizability.

Horizontal scalability is emphasized: small ranges enable elastic storage capacity, SQL capability expansion, backup/restore, and Change Data Capture (CDC). The design ensures that stored data remains computable, allowing the system to deliver high‑throughput, low‑latency processing for massive workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Sharding distributed transactions Raft HLC

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.