Databases 11 min read

Why Modern Databases Prefer LSM Trees Over B‑Trees: Hardware, Workloads, and More

Modern databases have largely shifted from B‑tree based storage to LSM‑tree engines due to SSD hardware characteristics, high‑write workloads, concurrency advantages, simpler implementation, and evolving application demands, with additional insights into Paxos/Raft consensus, common database jargon, and performance optimizations.

G7 EasyFlow Tech Circle
G7 EasyFlow Tech Circle
G7 EasyFlow Tech Circle
Why Modern Databases Prefer LSM Trees Over B‑Trees: Hardware, Workloads, and More

Almost all modern databases now use storage engines based on LSM trees, such as RocksDB, CockroachDB, TiDB, FoundationDB, Snowflake, Doris, OceanBase, and InfluxDB. Earlier systems like MySQL, PostgreSQL, and Oracle relied on B‑tree variants. This shift is driven by two major factors: hardware evolution and changing usage scenarios.

Hardware Evolution

SSD drives have become ubiquitous, offering dramatically higher performance than mechanical disks. However, SSDs cannot efficiently perform in‑place updates; they require erase‑then‑write cycles at the block (128 pages) and page (4 KB) granularity. Modifying a single byte on an SSD may involve reading, erasing, and rewriting an entire 512 KB block, causing write amplification far beyond that of HDDs and limiting write endurance.

The append‑only nature of LSM trees avoids in‑place modifications, aligning well with SSD write characteristics. Additionally, modern CPUs can no longer gain performance solely by increasing clock frequencies, making concurrent, lock‑free data structures more valuable. LSM trees, with their read‑only file segments, naturally support high concurrency without the fine‑grained locking required by B‑trees.

Changing Workloads

The rise of big‑data workloads—driven by Google’s seminal papers on HDFS, MapReduce, and Bigtable—has led to scenarios where write volume far exceeds reads. LSM trees’ append‑only design provides extremely high write throughput, and with appropriate compaction strategies, read performance remains competitive.

Implementation complexity also favors LSM trees: a functional LSM engine can be written relatively easily, whereas a correct B‑tree implementation is considerably more challenging. Open‑source projects like LevelDB and RocksDB (originating from Google and Facebook) have accelerated adoption.

Paxos and Raft in Modern Databases

Paxos and Raft are consensus algorithms used primarily for log replication across replicas, ensuring that all nodes apply the same sequence of operations. Unlike asynchronous message queues, these algorithms perform synchronous writes, guaranteeing consistency even when some replicas fail. They define quorum rules (e.g., 2 of 3 replicas) to balance latency and fault tolerance.

In practice, systems such as TiDB and OceanBase use Paxos/Raft to achieve high availability and strong consistency across distributed nodes.

Common Database Jargon

Predicate push‑down and projection push‑down refer to moving filter (WHERE) and column‑selection (SELECT) operations from the compute layer down to the storage layer, reducing data movement in distributed architectures.

Vector engine leverages CPU SIMD instructions to perform operations on whole arrays (vectors) in a single instruction, dramatically speeding up columnar computations.

Bypass describes accessing hardware directly, bypassing the operating system kernel—e.g., using user‑space RDMA or direct disk I/O to avoid kernel buffering and scheduling overhead.

Illustrations

In summary, modern hardware characteristics and write‑heavy workloads have driven the adoption of LSM trees, which naturally complement SSD behavior and provide superior concurrency. Consensus protocols like Paxos and Raft ensure consistent log replication across replicas, while emerging terminology such as predicate push‑down, vector engines, and bypass reflect ongoing optimizations in database design.

concurrencyLSM‑TreeSSDRaftPaxosDatabase StorageDatabase Jargon
G7 EasyFlow Tech Circle
Written by

G7 EasyFlow Tech Circle

Official G7 EasyFlow tech channel! All the hardcore tech, cutting‑edge innovations, and practical sharing you want are right here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.