How HBase Compaction Tuning Boosts Performance at Scale
This article explains LSM‑Tree based HBase compaction concepts, compares Minor and Major compactions, and shares practical tuning steps—including disabling automatic major compactions, controlling merge size, leveraging off‑peak windows, and improving merge efficiency—to reduce I/O, CPU usage, and latency in production environments.
Background
LSM‑Tree (Log‑Structured Merge Tree) architecture provides high‑throughput writes via log appends, but overlapping file ranges cause read performance degradation and storage bloat. Compaction periodically merges overlapping files to improve read performance and space usage, but it consumes significant I/O and CPU resources, making resource balancing a daily operational challenge. This article presents HBase (v2.0.2) compaction tuning practices from Huolala.
Concept Explanation
In HBase compaction, the terms Minor/Major, Short/Long, Small/Large are often confused. Below is a detailed explanation.
Minor Compaction : merges adjacent HFiles within an HStore and deletes HFiles that are completely expired.
Major Compaction : merges all HFiles in an HStore, removing TTL‑expired data, deleted data, and versions beyond the configured limit.
Short/Long : names of compaction thread pools (ShortCompactions and LongCompactions) controlled by hbase.regionserver.thread.compaction.small and hbase.regionserver.thread.compaction.large.
Small/Large : queues for those pools (SmallCompactionQueue and LargeCompactionQueue) controlled by hbase.regionserver.thread.compaction.throttle; when the Large queue is empty, the long pool processes tasks from the Small queue.
Production Practice
Compaction is a resource‑intensive task; tuning aims to meet business read‑latency requirements while using minimal resources. The following practices are described.
Control Maximum Merge
Disable automatic major compaction and define a “large merge” threshold. hbase.hregion.majorcompaction=0 Define large merge size (e.g., 1 GB) and avoid large merges during peak hours.
hbase.hstore.compaction.max.size=1073741824Improve Merge Efficiency
When two HFiles of 1 GB and 900 MB exist, any new flush (e.g., 15 MB) triggers compaction, which is inefficient. An optimization merges at least four similarly sized HFiles, reducing compaction I/O by 27.8 % and halving the first‑flush compaction count.
Determine Business Peaks
Identify low‑traffic windows (e.g., 20:00‑08:00) and configure off‑peak compaction.
hbase.offpeak.start.hour=20
hbase.offpeak.end.hour=6Leverage Off‑Peak Resources
Adjust max size for off‑peak to 5 GB and ratio to 5.0, merging daytime HFiles into larger ones while keeping HStore file count below 16.
hbase.hstore.compaction.max.size.offpeak=5368709120
hbase.hstore.compaction.ratio.offpeak=5.0Selective Disabling of Major
Automatic major compaction is disabled; manual triggering is applied only to suitable tables (TTL tables, small non‑TTL tables) where data expiration and disk capacity allow.
Deep Business Tuning
For a 580 TB table, manual major compaction reduced HFile count but increased Scan latency by 30 %. Improvements include using row‑prefix bloom filters (ROWPREFIX_FIXED_LENGTH) and setting appropriate time ranges to filter HFiles.
Conclusion
Compaction tuning is not a one‑size‑fits‑all service‑side tweak; it requires deep understanding of workload characteristics and dynamic configuration adjustments. The practices described, based on real‑world production experience with HBase 2.0.2, aim to inspire readers to devise effective compaction strategies.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
