Databases 6 min read

Understanding HBase Write Path and How to Prevent Write Blocking

This article explains the HBase data‑write process—including WAL logging, MemStore caching, and HFile flushing—identifies three levels of write‑blocking (HFile, MemStore, RegionServer), and provides configuration tweaks to mitigate blocking in production environments.

Big Data Technology Architecture

Apr 11, 2020

Understanding HBase Write Path and How to Prevent Write Blocking

HBase writes data to the server in three key stages before persisting to disk: first the data is appended to the Write‑Ahead Log (WAL) for recovery, then it is stored in the Region's MemStore cache (which marks the client write as successful), and finally MemStore is flushed to disk as an HFile when size thresholds are met.

Write‑blocking can occur at three levels:

HFile‑level blocking happens when a Store accumulates too many HFiles, triggering log messages like "has too many store files..." and requiring Minor Compaction to merge files. Important parameters include hbase.hstore.blockingStoreFiles, hbase.hstore.compaction.min, hbase.hstore.compaction.max, hbase.regionserver.thread.compaction.small, and hbase.regionserver.thread.compaction.large. Increasing hbase.hstore.blockingStoreFiles (default 10 in HBase 1.x, 16 in 2.x) to a larger value such as 100 can reduce this blocking.

MemStore‑level blocking occurs when the total MemStore size of a Region exceeds a multiplier of the flush size ( hbase.hregion.memstore.block.multiplier, default 4). When the threshold (e.g., 4 × 128 MB = 512 MB) is reached, the Region blocks updates and forces a flush, potentially throwing RegionTooBusyException. Adjusting hbase.hregion.memstore.flush.size and hbase.hregion.memstore.block.multiplier (e.g., raising the flush size to 256 MB and the multiplier to 5‑8) can alleviate this.

RegionServer‑level blocking is governed by global MemStore size limits. The parameter hbase.regionserver.global.memstore.size (default 0.4 of the RegionServer heap) caps the total MemStore memory; once reached, writes are blocked and logs indicate "Blocking updates...the global memstore size...". A lower‑limit parameter ( hbase.regionserver.global.memstore.size.lower.limit, default 0.95) triggers forced flushes before the upper limit is hit. Additionally, the BlockCache size ( hfile.block.cache.size) shares the heap with MemStore, and adjusting the read/write cache ratio can help in write‑heavy scenarios.

Beyond parameter tuning, ensuring the RegionServer JVM heap is sized appropriately for the cluster workload (often larger than the default 1 GB) and selecting suitable garbage‑collection settings are essential for stable performance.

Conclusion : By understanding the HBase write path and the three levels at which blocking can arise, operators can fine‑tune key configuration parameters—such as hbase.hstore.blockingStoreFiles, hbase.hregion.memstore.flush.size, and hbase.regionserver.global.memstore.size —to minimize write latency and avoid write‑blocking incidents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Configuration HBase Databases Write Path Blocking

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.