Databases 20 min read

NewSQL Explained: Storage Engines, Sharding, and Distributed Transactions

This article examines the core technologies behind NewSQL databases for OLTP workloads, contrasting them with NoSQL and traditional relational systems by exploring storage engines such as B‑Tree and LSM‑Tree, sharding strategies, replication models, and distributed transaction mechanisms like two‑phase commit.

dbaplus Community
dbaplus Community
dbaplus Community
NewSQL Explained: Storage Engines, Sharding, and Distributed Transactions

NewSQL & NoSQL Overview

NewSQL targets OLTP scenarios with high concurrency and low latency, offering features similar to traditional RDBMS (Oracle, DB2) while scaling horizontally on commodity X86 servers. Prominent NewSQL systems include Google Spanner/F1, Alibaba OceanBase, CockroachDB, and TiDB.

Storage Engine: B+Tree

B+Tree is the classic index structure used by relational databases, enabling efficient range scans with ordered leaf nodes. Its drawbacks include poor performance for random writes, leading to write amplification and storage fragmentation. Real‑world databases (e.g., MySQL) expose a fill factor setting to balance page density against write overhead.

B+Tree example
B+Tree example

Storage Engine: LSM‑Tree

Log‑Structured Merge Tree (LSM‑Tree) converts random writes into sequential writes by buffering updates in memory (Memtable) and periodically flushing them as sorted SSTables. The process includes Memtable creation, Minor Compaction (flush), and Major Compaction (merge). LSM‑Tree reduces write amplification compared to B‑Tree but introduces its own challenges: heavy Major Compaction, read amplification due to overlapping SSTables, and space amplification.

LSM‑Tree architecture
LSM‑Tree architecture

Sharding Strategies

Sharding (horizontal partitioning) distributes data across multiple nodes to achieve scalability. Two primary approaches are Range sharding, which preserves data locality for range queries, and Hash sharding, which balances load evenly. Systems may combine both; for example, HBase uses Range sharding but recommends encoding RowKeys to avoid hotspotting.

Static sharding fixes the number of shards at design time, making later adjustments costly, while dynamic sharding adapts shard counts based on data distribution, offering greater flexibility at the expense of more complex metadata management.

Replication Models

Replication ensures durability and availability. Consistency models differ:

Strong sync : all replicas must acknowledge writes before success—high latency, low availability.

Semi‑sync : acknowledgment from any replica suffices—better availability but possible fallback to async.

Paxos/Raft : majority quorum decides commit, tolerating node failures.

Reliability guarantees data persistence, while availability concerns continuous service. Examples include HBase’s RegionServer failover and CockroachDB/TiDB’s Raft‑based KV stores that decouple storage from compute to maintain availability.

Distributed Transaction Management

NewSQL re‑introduces distributed transactions, typically using two‑phase commit (2PC). The protocol consists of a request phase (prepare) and a commit phase (finalize). While conceptually simple, 2PC suffers from blocking and single‑point coordinator failures.

Google Spanner extends 2PC with dynamic tablet directories to reduce cross‑node transaction spans. Percolator, another Google project, modifies 2PC by leveraging MVCC and lock separation to avoid read blocking.

Transaction models can be lock‑based or lock‑free, optimistic or pessimistic, each with trade‑offs in latency and contention handling.

Conclusion

The article provides a vertical analysis of NewSQL technologies—storage engines, sharding, replication, and transaction management—highlighting their advantages, limitations, and practical implementations in systems such as Spanner, OceanBase, CockroachDB, and TiDB.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

shardingReplicationNewSQLdistributed databasestransaction-managementStorage Engines
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.