Understanding LSM-Tree Architecture and Its Applications in Big Data Systems
The article explains the Log-Structured Merge-Tree (LSM) architecture, its core components, advantages and disadvantages, and demonstrates how it is employed in big‑data platforms such as HBase and Apache Druid to achieve high‑throughput writes and scalable query processing.
LSM Architecture
Overview
LSM ( Log-Structured Merge-Tree ) is a write‑optimized data structure used in key‑value stores and data systems; all write operations (insert, update, delete) are first recorded in memory and later flushed to disk in batches, reducing I/O and improving write performance.
Components
The main components of an LSM‑Tree are MemTable (in‑memory table), immutable SSTable (disk table), the merge process, compaction, Bloom filter, and the Write‑Ahead Log ( WAL ) which ensures durability.
MemTable: temporarily stores newly written data in memory.
SSTable: when the MemTable is full, data are flushed as immutable disk tables.
Merge: periodically combines multiple SSTables to keep their number and size manageable.
Compaction: compresses data during merge to optimize storage space.
Bloom Filter: quickly checks whether a key may exist in an SSTable, reducing unnecessary disk reads.
WAL: logs writes before they reach the MemTable, enabling recovery after crashes.
Advantages and Disadvantages
Advantages include high write throughput, batch writes, and delayed merge/compaction, making LSM‑Tree ideal for write‑intensive workloads. Disadvantages are slower reads, space amplification, and write amplification.
LSM in the Big Data Domain
HBase Application of LSM‑Tree
In HBase, data are first written to the HLog (commit log), then to the MemStore (equivalent to a MemTable), and finally flushed to disk as a storeFile (equivalent to an SSTable). This LSM‑Tree‑based flow enables fast ingestion, though reads may need to query both the MemStore and multiple SSTables.
To mitigate read latency, HBase performs periodic compaction of small SSTables into larger ones and employs Bloom filters for faster key existence checks.
Apache Druid Application of LSM‑Tree
Druid adopts an LSM‑like architecture for its real‑time nodes. Incoming data are first loaded into an in‑memory buffer (similar to a MemTable). When conditions are met, the buffer is flushed to disk as a segment, which is immediately loaded into off‑heap memory for fast querying.
Real‑time nodes periodically merge segments from the same time window into larger segments (segment merge), analogous to LSM‑Tree compaction, and upload the merged segments to DeepStorage.
The coordinator node assigns historical nodes to download and load these segments, after which query nodes retrieve data from both real‑time and historical nodes and merge the results for the user.
Top Architecture Tech Stack
Sharing Java and Python tech insights, with occasional practical development tool tips.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.