Databases 7 min read

Understanding B+ Trees and Log‑Structured Merge (LSM) Trees and Their Use in HBase

This article explains the fundamentals of B+ trees, introduces log‑structured merge (LSM) trees as a modern alternative for write‑intensive workloads, and demonstrates how HBase leverages LSM trees—including MemStore, HFile, compaction, and Bloom filters—to achieve efficient storage and retrieval in NoSQL environments.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding B+ Trees and Log‑Structured Merge (LSM) Trees and Their Use in HBase

In relational databases such as MySQL, SQL Server, and Oracle, data is indexed using B‑trees and B+‑trees, while many NoSQL systems like HBase, Cassandra, LevelDB, and RocksDB employ log‑structured merge (LSM) trees.

Review of B+ Trees – B+ trees reduce seek time on magnetic HDDs by providing a shallow, multi‑way balanced search structure where only leaf nodes store actual data, forming an ordered linked list that speeds up range queries. Their advantages include low height, high storage density, and efficient sequential reads, while drawbacks involve random writes and leaf‑node fragmentation after many splits.

Introducing LSM Trees – An LSM tree consists of multiple component trees (C0 in memory, C1, C2 … on disk). Writes are first buffered in the in‑memory C0 tree and later flushed to higher‑level trees in sorted order, turning random writes into sequential writes and greatly improving write throughput, at the cost of slightly higher read latency.

The design also incorporates write‑ahead logging (WAL) to protect against power loss, and compaction processes (minor and major) merge lower‑level components into larger ones, similar to HBase’s file compaction.

LSM Trees in HBase – HBase’s MemStore acts as the C0 layer, implemented with a skip‑list rather than a tree. Flushing MemStore creates HFiles, which are essentially three‑level B+ trees (root, intermediate, leaf) storing KeyValue pairs. Excessive HFiles degrade performance, so HBase performs minor and major compactions to consolidate them.

To improve random‑read performance under the LSM model, HBase adds Bloom filters (Bloom index blocks) to HFiles, allowing fast existence checks with minimal space overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LSM‑TreeHBaseNoSQLB+Treedatabase indexing
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.