Optimizing Distributed Transactions in CB‑SQL: From Two‑Phase Commit to Parallel and Pipeline Commit
This article explains how CB‑SQL improves distributed transaction performance by introducing a transaction record table, parallel prepare requests, one‑phase commit for single‑shard writes, latch‑based consistent reads, transaction pipelines, and a staged parallel‑commit mode, dramatically reducing latency and I/O overhead.
Distributed transactions traditionally rely on two‑phase commit, which incurs high latency because each transaction requires multiple round‑trips and replicated I/O across participants.
CB‑SQL, built on CockroachDB and compatible with the MySQL protocol, implements full‑distributed transactions with SSI isolation and introduces several optimizations.
Using a table INSERT INTO t VALUES (1,'x'), (2,'y'), (3,'z'); as an example, the three rows are stored on three separate shards, illustrating the baseline two‑phase commit flow.
The system adds a transaction‑record table to recover from coordinator failures, applies time‑outs to prepare requests, and stores a commit flag to guarantee atomicity, reducing the serial execution latency from 5t to 3t by sending prepare messages in parallel.
When all rows reside on a single shard, CB‑SQL bypasses the heavyweight two‑phase protocol, using Raft log replication and RocksDB batch writes to achieve a one‑phase commit with latency t.
Further, the transaction‑record table is co‑located with the first record’s shard, merging I/O and cutting latency to 2t.
To improve read performance, CB‑SQL introduces a latch mechanism combined with Raft lease, allowing consistent reads without extra Raft log entries.
For interactive transactions, a pipeline model lets writes return immediately while replication proceeds asynchronously; the commit phase then verifies all prepares before marking the transaction as committed.
Finally, a parallel‑commit mode adds a STAGED state, sending all prepares concurrently and committing asynchronously once they succeed, dramatically lowering latency for OLTP workloads.
These successive optimizations demonstrate that transaction performance can be continuously refined in a distributed database system.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
