Databases 15 min read

Inside YugabyteDB: Architecture, Tablet Storage, and Distributed Transactions Explained

This article provides a comprehensive technical overview of YugabyteDB, covering its two‑layer logical architecture, tablet‑based distributed storage with Raft groups, RocksDB‑backed local storage design, hybrid hash‑range partitioning, and the MVCC‑based two‑phase‑commit transaction model using Hybrid Logical Clocks.

ITPUB
ITPUB
ITPUB
Inside YugabyteDB: Architecture, Tablet Storage, and Distributed Transactions Explained

System Architecture

YugabyteDB follows a logical two‑layer design consisting of a query layer and a storage layer, both running inside the TServer process. The query layer exposes two APIs: SQL (a PostgreSQL‑compatible dialect) and CQL (Cassandra‑compatible). The storage layer is where the core functionality resides.

Tablet‑Based Distributed Storage

Data is split into tablets, the smallest unit of distribution, similar to HBase or Spanner. Each tablet forms a Raft group with multiple replicas spread across three nodes to ensure high availability. The Master node manages metadata such as tablet locations and schema information, also using Raft for its own fault tolerance.

YugabyteDB supports flexible partitioning schemes: hash‑only, range‑only, or a combination of hash‑then‑range, a design influenced by Cassandra. Hash partitioning maps keys to a 2‑byte space ( 0x00000xFFFF) which is further divided into ranges; up to 64K tablets are possible. Hash partitioning avoids write hotspots for append‑heavy workloads but can degrade performance for small range scans (e.g., pk BETWEEN 1 AND 10).

RocksDB‑Backed Local Storage (DocDB)

Each TServer hosts a local DocDB built on RocksDB. Tuples and documents are encoded as key‑value pairs. The key consists of a 16‑bit hash (for hash partitioning), primary‑key columns, a column ID (to represent individual columns), and a hybrid timestamp used for MVCC. The value stores the column's actual data.

Key components

16‑bit hash

Primary‑key data

Column ID

Hybrid timestamp

Value component

Column value

Distributed Transactions: 2PC & MVCC

Timestamp

YugabyteDB uses a Hybrid Logical Clock (HLC) for transaction timestamps, combining a physical component (UNIX time) with a logical Lamport counter. Within the same millisecond the physical part stays constant while the logical part increments on each RPC, providing a partial order of events.

HLC offers external consistency similar to Google’s TrueTime but without requiring a dedicated time‑serving node. An alternative is a centralized Timestamp Oracle (TSO) as used by TiDB, which simplifies timestamp acquisition but creates a single point of failure.

Transaction Commit

Transactions are implemented with two‑phase commit (2PC) and MVCC. During commit, YugabyteDB writes provisional records to DocDB, categorized as:

Primary provisional records – uncommitted data with a transaction ID, acting as a lock.

Transaction metadata – stores the tablet ID where the transaction state resides.

Reverse index – maps each primary provisional record for recovery.

The transaction state is kept in a separate tablet with three possible statuses: Pending, Committed, or Aborted. Transition to the Committed state marks the commit point, guaranteeing atomicity.

YugabyteDB also supports Snapshot Isolation and, as of version 2.0 GA, Serializable isolation, though details on write‑skew prevention are not yet documented.

Competitive Comparison

A comparison table (sourced from Yugabyte’s documentation) positions YugabyteDB alongside TiDB, CockroachDB, and other distributed databases, highlighting similarities in architecture, global distribution, and ACID transaction support.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed databaseRocksDBMVCCtwo-phase commitYugabyteDBhybrid logical clockTablet Storage
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.