Understanding B+ Trees and Log-Structured Merge (LSM) Trees and Their Use in HBase
This article reviews B+ trees, introduces log‑structured merge (LSM) trees, compares their strengths and weaknesses, and explains how HBase leverages LSM trees, HFiles, compaction, and Bloom filters to achieve high‑performance storage for write‑intensive workloads.
In relational databases such as MySQL, SQL Server, and Oracle, data is indexed using B‑tree or B+‑tree structures, while many NoSQL systems like HBase, Cassandra, LevelDB, and RocksDB employ Log‑Structured Merge (LSM) trees; the article first revisits B+ trees and then explains LSM trees and their role in HBase.
Review of B+ trees : B+ trees reduce seek time on HDDs by using a multi‑way balanced search tree; a height‑3 4‑way B+ tree example is shown below.
Advantages of B+ trees include shallow height (usually ≤4 levels) leading to few random seeks, high storage density with all data in leaf nodes, and leaf nodes forming an ordered linked list that enables efficient range queries.
Disadvantages appear when writes are random or when leaf nodes split, causing data blocks to become fragmented and turning range queries into random reads; illustrative images are provided.
Introducing LSM trees : Proposed by Patrick O'Neil, an LSM tree consists of multiple component trees (C0 in memory, C1, C2 … on disk). Data is first written to the in‑memory C0 tree and flushed to the next level when a size threshold is reached, as illustrated below.
Because memory I/O is much faster than disk I/O, writes to C0 are highly efficient, and flushing to disk occurs in sorted order, turning random writes into sequential writes and greatly improving write performance. The trade‑off is slightly slower reads, mitigated by caching C0 data and merging results from higher levels.
Durability is ensured by writing the same data to a write‑ahead log (WAL) on disk; when multiple levels exist, lower‑level trees are periodically merged (compaction), as shown in the following diagrams.
LSM trees in HBase : In HBase, the MemStore acts as the C0 layer, using a skip‑list instead of a traditional tree. Flushing MemStore creates HFiles, which are essentially three‑level B+ trees (Root, Intermediate, Leaf indexes) storing KeyValue pairs; the structure is depicted below.
Excessive HFiles degrade performance, so HBase performs minor and major compactions to merge them; major compaction touches the whole region and is expensive, so it should be minimized.
To improve random‑read performance, HBase can enable Bloom filters (specified in the table schema), which add a Bloom index block to each HFile, allowing rapid existence checks with minimal space overhead.
Overall, LSM trees provide a write‑optimized alternative to B+ trees for write‑heavy, large‑scale storage systems such as HBase, offering high throughput while using techniques like caching, compaction, and Bloom filters to keep read performance acceptable.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
