Databases 11 min read

Understanding HBase Flush and Compaction Mechanisms and Their Configuration Parameters

This article explains the core mechanisms of HBase—Flush and Compaction—detailing why they are needed, the conditions that trigger Flush, the types and triggers of Compaction, and provides practical recommendations for tuning the most important configuration parameters to improve write and read performance.

Big Data Technology Architecture

Mar 2, 2020

Understanding HBase Flush and Compaction Mechanisms and Their Configuration Parameters

HBase, the open‑source implementation of Google BigTable, stores data using an LSM‑tree structure; writes are first logged to a WAL and then cached in MemStore, which is flushed to disk as HFiles when certain thresholds are reached, turning random writes into sequential writes for higher performance.

As the number of HFiles grows, read performance degrades, so HBase periodically runs Compaction to merge HFiles and reduce I/O latency.

1. Why Flush and Compaction are Needed

Flush creates persistent HFiles from MemStore, while Compaction merges multiple HFiles to keep the file count low and improve read efficiency.

2. Flush Trigger Conditions and Key Parameters

Flush can be triggered by seven main situations, including MemStore size exceeding hbase.hregion.memstore.flush.size (default 128 MB), total Region MemStore size exceeding a multiple of that threshold, RegionServer memory pressure, WAL file count limits, update count thresholds, periodic timers, and manual commands via the HBase shell or API.

Important Flush‑related parameters and tuning advice:

hbase.hregion.memstore.flush.size – default 128 MB; increase to 256 MB if memory is abundant and Flush is too frequent.

hbase.hregion.memstore.block.multiplier – default 4; adjust to 5‑8 in write‑heavy, memory‑rich scenarios to avoid write blocking.

hbase.regionserver.global.memstore.size – default 0.4 (40% of RegionServer heap); keep in sync with read cache settings.

hbase.regionserver.global.memstore.size.lower.limit – default 0.95; generally left unchanged.

hbase.regionserver.optionalcacheflushinterval – default 3 600 000 ms (1 h); consider increasing to reduce frequent small Flushes.

3. Compaction Types, Triggers, and Core Parameters

Compaction comes in two forms:

Minor Compaction – merges a few adjacent small HFiles and removes TTL‑expired data.

Major Compaction – merges all HFiles in a store, cleaning deleted rows, expired TTL data, and excess versions; it is resource‑intensive and often disabled in production, being run manually during low‑traffic periods.

Compaction is triggered by MemStore Flush, a background CompactionChecker thread (interval based on hbase.server.thread.wakefrequency and hbase.server.compactchecker.interval.multiplier), or manual commands.

Key Compaction parameters and recommendations:

hbase.hstore.compaction.min – default 3; number of HFiles that must exist to start a Minor Compaction. Usually left unchanged.

hbase.hstore.compaction.max – default 10; maximum HFiles merged in one Minor Compaction. Adjust proportionally if min is changed.

hbase.regionserver.thread.compaction.throttle – controls which thread pool (large vs. small) handles a Compaction; default 2 × maxFilesToCompact × flush size (≈2.5 GB). Generally not tuned.

hbase.regionserver.thread.compaction.large/small – default 1 thread each; recommended to increase to 2‑5 for better concurrency.

hbase.hstore.blockingStoreFiles – default 10; number of HFiles that block writes until Compaction finishes. Raising to 100 can prevent write stalls.

hbase.hregion.majorcompaction – default 604800000 ms (7 days); often set to 0 to disable automatic Major Compaction and run it manually.

Example log when too many store files block flushing:

too many store files; delaying flush up to 90000ms

4. Summary

The article provides a comprehensive overview of HBase's Flush and Compaction mechanisms, explains the conditions and parameters that control them, and offers practical tuning suggestions to maintain cluster stability and performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Compaction Configuration HBase NoSQL Flush

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.