HBase Compaction Types and Parameter Tuning Guide
This article explains how HBase uses WAL and MemStore to create HFiles, describes the two compaction types (Minor and Major), and provides detailed recommendations for tuning key compaction-related configuration parameters to improve query performance and reduce HDFS impact.
In HBase, data is first written to the WAL and MemStore, then periodically flushed to disk as HFiles; as the number of HFiles grows, query performance degrades and HDFS load increases, so HBase performs regular compaction to merge and reduce HFile count.
1. Two Compaction Types
Minor Compaction selects a few small, adjacent HFiles and merges them into a larger HFile while removing expired data.
Major Compaction merges all HFiles of a column family into a single large HFile and removes expired, deleted, and over‑versioned data.
2. Parameter Tuning
hbase.hstore.compaction.min (default 3): triggers Minor Compaction when the number of HFiles in a column family exceeds this value; recommended to increase to 5‑10.
hbase.hstore.compaction.max (default 10): maximum number of HFiles merged in one Minor Compaction; should be 2‑3 times larger than the min value.
hbase.regionserver.thread.compaction.throttle : determines whether a compaction is handled by the large‑compaction or small‑compaction thread pool; default is 2 × hbase.hstore.compaction.max × hbase.hregion.memstore.flush.size (≈2.5 GB). Usually left unchanged or slightly increased.
hbase.regionserver.thread.compaction.small (default 1): size of the small‑compaction thread pool; typically set to 2‑5.
hbase.regionserver.thread.compaction.large (default 1): size of the large‑compaction thread pool; adjust similarly to the small pool.
hbase.hstore.blockingStoreFiles (default 10): when the number of HFiles reaches this value, writes are blocked until compaction finishes; in production, increase to around 100 to avoid write stalls.
hbase.hregion.majorcompaction (default 604800000 ms, i.e., 7 days): interval for periodic Major Compaction; because Major Compaction is resource‑intensive, it is often disabled (set to 0) and run manually during low‑traffic periods.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.