Understanding HBase Compaction: Types, Triggers, Algorithms, and Impact on Read/Write Performance
This article explains HBase compaction—a key operation in the Log‑Structured Merge‑Tree model—covering minor and major compaction differences, trigger conditions, configuration parameters, selection algorithms, thread‑pool handling, and the effects on read and write performance in a big‑data database environment.
Compaction is a core operation in the Log‑Structured Merge‑Tree (LSM) model used by HBase, following the buffer → flush → merge workflow; it merges StoreFiles, removes deleted, expired, or redundant versioned data, and improves overall read/write efficiency.
Minor vs. Major Compaction : Minor compaction merges only a subset of StoreFiles and cleans up TTL‑expired versions (minVersion=0) without dropping deletes or old versions, whereas major compaction merges all StoreFiles in a region, producing a single file and also removing deletes and expired cells.
Compaction Trigger Factors : A compaction is considered when the number of StoreFiles minus those already compacting exceeds minFilesToCompact (default 3). Key configuration items include minFilesToCompact (hbase.hstore.compactionThreshold), maxFilesToCompact (hbase.hstore.compaction.max, default 10), maxCompactSize (hbase.hstore.compaction.max.size), and minCompactSize (hbase.hstore.compaction.min.size). The CompactionChecker thread runs roughly every 2 hours 46 minutes 40 seconds by default.
Selection Process :
Select candidate StoreFiles that are not currently compacting.
Apply the compactSelection algorithm to choose files for compaction.
Filter out expired files where minVersion=0 and storefile.maxTimeStamp + ttl < now.
Use ScanQueryMatcher (MAJOR_COMPACT, MINOR_COMPACT, USER_SCAN) to filter KV pairs during scanning.
Apply a size‑based smart algorithm controlled by hbase.hstore.compaction.ratio (default 1.2) and off‑peak settings to avoid compacting overly large files.
Thread‑Pool Handling : Two pools— largeCompactions and smallCompactions —process CompactionRequest s. Files whose total size exceeds the throttle (default 2 × minFilesToCompact × memstoreFlushSize) go to the large pool; others use the small pool. It is recommended to increase both pools to about five threads for heavy workloads.
Major Compaction Conditions : Triggered when the interval defined by hbase.hregion.majorcompaction elapses and the number of files to compact is below maxFilesToCompact. Setting this value to 0 disables automatic major compaction, though manual requests still work.
Impact on Read/Write Operations : Compaction operates at the Store level, whereas flush works at the Region level, leading to finer‑grained locking. During compaction, temporary files are written to a .tmp directory (e.g., /hbase‑weibo/.../.tmp) so reads are not blocked. After compaction, the new StoreFile replaces the old ones, briefly blocking writes while the store files are updated.
Configuration Summary : hbase.hregion.majorcompaction: interval for automatic major compaction (0 disables). hbase.hstore.compaction.max: maximum number of files to compact (default 10). hbase.hstore.compactionThreshold: threshold for triggering compaction (default 3).
Parameter
Config Key
Default
minFilesToCompact
hbase.hstore.compactionThreshold
3
maxFilesToCompact
hbase.hstore.compaction.max
10
maxCompactSize
hbase.hstore.compaction.max.size
Long.MAX_VALUE
minCompactSize
hbase.hstore.compaction.min.size
memstoreFlushSize
Overall, major compaction occurs when all StoreFiles need merging and the configured thresholds are satisfied, while minor compaction runs more frequently on smaller file groups to keep the system efficient.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
