Databases 14 min read

Inside YugabyteDB: Architecture, Tablet Storage, and Distributed Transactions

This article explains YugabyteDB's two‑layer logical architecture, its tablet‑based distributed storage built on Raft groups, the RocksDB‑backed local DocDB, and how it implements distributed transactions using Hybrid Logical Clocks, two‑phase commit, and MVCC, while comparing it with TiDB, CockroachDB and other rivals.

21CTO
21CTO
21CTO
Inside YugabyteDB: Architecture, Tablet Storage, and Distributed Transactions

YugabyteDB is a globally distributed, ACID‑compliant database inspired by Google Spanner, similar to TiDB and CockroachDB, offering both SQL (PostgreSQL‑compatible) and CQL (Cassandra‑compatible) APIs.

System Architecture

Logically, YugabyteDB has a two‑layer architecture: a query layer and a storage layer, both running inside the TServer process. The query layer handles SQL and CQL requests, while the storage layer manages tablets stored in Raft groups across three nodes for high availability. The Master service stores metadata such as tablet locations and schema information, also using Raft for fault tolerance.

Tablet‑Based Distributed Storage

Data is split into tablets, the smallest unit of distribution. Each tablet belongs to a Raft group with multiple replicas; the group leader handles writes while followers provide redundancy. Tablets can be split, merged, and moved across nodes to achieve near‑unlimited scale‑out. YugabyteDB supports hash, range, or hybrid hash‑range partitioning, allowing up to 64K tablets.

RocksDB‑Based Local Storage (DocDB)

Each TServer hosts a local DocDB that uses RocksDB to store key‑value pairs representing relational tuples or document data. Keys contain a 16‑bit hash for partitioning, primary‑key columns, column IDs, and a hybrid timestamp for MVCC. Values store the column data.

Distributed Transactions: 2PC & MVCC

Timestamp

YugabyteDB uses Hybrid Logical Clocks (HLC), which combine a physical UNIX timestamp with a logical Lamport counter. This provides total ordering of events while keeping timestamps monotonic within each millisecond.

Transaction Commit

Transactions follow a two‑phase commit (2PC) protocol similar to CockroachDB. During commit, provisional records are written to DocDB, including primary provisional records (acting as locks), transaction metadata (tablet ID of the transaction state), and a reverse index for recovery.

The transaction state tablet records the status as Pending, Committed, or Aborted; moving to Committed finalizes the transaction and guarantees atomicity.

Competitor Comparison

The following table (from YugabyteDB documentation) compares YugabyteDB with other databases such as TiDB, CockroachDB, and Spanner, highlighting differences in architecture, partitioning, and transaction models.

References

YugabyteDB Official Documentation

YugabyteDB GitHub Repository

Living Without Atomic Clocks – Cockroach Labs

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed database2PCRocksDBMVCCYugabyteDBTablet Storage
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.