Transparent Scalable NewSQL: Design and Implementation of CB‑SQL Distributed Database
The article explains the concept of transparent scalable NewSQL databases, discusses the historical challenges of scaling relational systems, and details the design and implementation techniques—such as dynamic sharding, distributed transactions, consensus algorithms, distributed SQL, and CB‑SQL’s specific mechanisms like HLC, SSI isolation, Multi‑Raft, and automatic load balancing—to achieve elasticity, high availability, and seamless scalability.
Introduction
Life is likened to a leisurely duck: it appears calm on the surface while paddling furiously underneath. In the same way, NewSQL databases aim to be "transparent"—users should not worry about capacity, data loss, transactions, SQL compatibility, availability, or fault migration.
What Is Transparent Scalability?
Since the 1960s relational databases have faced scaling limits. Vertical scaling is costly; storage‑compute separation (e.g., AWS Aurora, PolarDB) solves storage growth but not compute growth. Sharding and middleware partially address capacity, yet they sacrifice SQL scalability and distributed transaction capabilities, leading to the need for truly transparent, scalable distributed databases.
Characteristics of a Transparent Scalable Distributed Database
Elastic, on‑demand scaling without application changes.
Scalable transaction, SQL, and storage capabilities.
High availability.
How to Achieve Transparent Scalability
Data Sharding
Data is divided into dynamic shards (ranges). Most systems (Spanner, CockroachDB, TiDB, OceanBase) use small shards (64 MB–128 MB) to balance load and metadata overhead. When a shard exceeds a threshold, it is split into two, allowing logical re‑partitioning without moving data; any required data migration is performed in small 64 MB blocks across replicas, enabling fast, fault‑tolerant rebalancing.
Distributed Transactions
Transactions must satisfy ACID. Distributed transactions rely on the two‑phase commit (2PC) protocol, but real‑world deployments must handle coordinator/participant failures, network partitions, and latency. Isolation levels such as Snapshot Isolation (SI) or Serializable Snapshot Isolation (SSI) are used; global clocks (TrueTime, centralized clocks, or Hybrid Logical Clocks) order events across nodes.
Consensus Algorithms
Consistency among replicas is achieved with Paxos or its engineering‑friendly variant Raft. Raft simplifies implementation at the cost of single‑threaded log ordering, while Paxos allows parallel log commits. Both provide leader election and automatic failover.
Distributed SQL Execution
Complex SQL queries are split into sub‑tasks pushed to storage nodes, which execute locally and return intermediate results. A gateway aggregates these results, reducing network traffic and leveraging cluster parallelism.
CB‑SQL Specific Practices
CB‑SQL, built on CockroachDB and compatible with the MySQL protocol, implements the above concepts:
Uses Hybrid Logical Clocks (HLC) combining physical time (via NTP) and logical counters to order events.
Provides SSI isolation with a timestamp cache to detect conflicts.
Stores transaction metadata in a dedicated transaction table for recovery.
Adopts Multi‑Raft to group many Raft groups, reducing connection overhead.
Employs Gossip‑based automatic load balancing, migrating ranges based on CPU, memory, disk, QPS, and lease‑holder metrics.
Automatic Load Balancing
Nodes exchange load information via Gossip; the cluster continuously re‑balances ranges, handling node additions and failures without manual intervention.
Contact
[email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
