Inside YugabyteDB: Architecture, Tablet Storage, and Distributed Transactions
This article explains YugabyteDB's two‑layer logical architecture, its tablet‑based distributed storage built on Raft groups, the RocksDB‑backed local DocDB, and how it implements distributed transactions using Hybrid Logical Clocks, two‑phase commit, and MVCC, while comparing it with TiDB, CockroachDB and other rivals.
YugabyteDB is a globally distributed, ACID‑compliant database inspired by Google Spanner, similar to TiDB and CockroachDB, offering both SQL (PostgreSQL‑compatible) and CQL (Cassandra‑compatible) APIs.
System Architecture
Logically, YugabyteDB has a two‑layer architecture: a query layer and a storage layer, both running inside the TServer process. The query layer handles SQL and CQL requests, while the storage layer manages tablets stored in Raft groups across three nodes for high availability. The Master service stores metadata such as tablet locations and schema information, also using Raft for fault tolerance.
Tablet‑Based Distributed Storage
Data is split into tablets, the smallest unit of distribution. Each tablet belongs to a Raft group with multiple replicas; the group leader handles writes while followers provide redundancy. Tablets can be split, merged, and moved across nodes to achieve near‑unlimited scale‑out. YugabyteDB supports hash, range, or hybrid hash‑range partitioning, allowing up to 64K tablets.
RocksDB‑Based Local Storage (DocDB)
Each TServer hosts a local DocDB that uses RocksDB to store key‑value pairs representing relational tuples or document data. Keys contain a 16‑bit hash for partitioning, primary‑key columns, column IDs, and a hybrid timestamp for MVCC. Values store the column data.
Distributed Transactions: 2PC & MVCC
Timestamp
YugabyteDB uses Hybrid Logical Clocks (HLC), which combine a physical UNIX timestamp with a logical Lamport counter. This provides total ordering of events while keeping timestamps monotonic within each millisecond.
Transaction Commit
Transactions follow a two‑phase commit (2PC) protocol similar to CockroachDB. During commit, provisional records are written to DocDB, including primary provisional records (acting as locks), transaction metadata (tablet ID of the transaction state), and a reverse index for recovery.
The transaction state tablet records the status as Pending, Committed, or Aborted; moving to Committed finalizes the transaction and guarantees atomicity.
Competitor Comparison
The following table (from YugabyteDB documentation) compares YugabyteDB with other databases such as TiDB, CockroachDB, and Spanner, highlighting differences in architecture, partitioning, and transaction models.
References
YugabyteDB Official Documentation
YugabyteDB GitHub Repository
Living Without Atomic Clocks – Cockroach Labs
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
