Big Data 11 min read

How HBase Compaction Tuning Boosts Performance at Scale

This article explains LSM‑Tree based HBase compaction concepts, compares Minor and Major compactions, and shares practical tuning steps—including disabling automatic major compactions, controlling merge size, leveraging off‑peak windows, and improving merge efficiency—to reduce I/O, CPU usage, and latency in production environments.

Huolala Tech

Dec 27, 2023

How HBase Compaction Tuning Boosts Performance at Scale

Background

LSM‑Tree (Log‑Structured Merge Tree) architecture provides high‑throughput writes via log appends, but overlapping file ranges cause read performance degradation and storage bloat. Compaction periodically merges overlapping files to improve read performance and space usage, but it consumes significant I/O and CPU resources, making resource balancing a daily operational challenge. This article presents HBase (v2.0.2) compaction tuning practices from Huolala.

Concept Explanation

In HBase compaction, the terms Minor/Major, Short/Long, Small/Large are often confused. Below is a detailed explanation.

Minor Compaction : merges adjacent HFiles within an HStore and deletes HFiles that are completely expired.

Major Compaction : merges all HFiles in an HStore, removing TTL‑expired data, deleted data, and versions beyond the configured limit.

Short/Long : names of compaction thread pools (ShortCompactions and LongCompactions) controlled by hbase.regionserver.thread.compaction.small and hbase.regionserver.thread.compaction.large.

Small/Large : queues for those pools (SmallCompactionQueue and LargeCompactionQueue) controlled by hbase.regionserver.thread.compaction.throttle; when the Large queue is empty, the long pool processes tasks from the Small queue.

Production Practice

Compaction is a resource‑intensive task; tuning aims to meet business read‑latency requirements while using minimal resources. The following practices are described.

Control Maximum Merge

Disable automatic major compaction and define a “large merge” threshold. hbase.hregion.majorcompaction=0 Define large merge size (e.g., 1 GB) and avoid large merges during peak hours.

hbase.hstore.compaction.max.size=1073741824

Improve Merge Efficiency

When two HFiles of 1 GB and 900 MB exist, any new flush (e.g., 15 MB) triggers compaction, which is inefficient. An optimization merges at least four similarly sized HFiles, reducing compaction I/O by 27.8 % and halving the first‑flush compaction count.

Determine Business Peaks

Identify low‑traffic windows (e.g., 20:00‑08:00) and configure off‑peak compaction.

hbase.offpeak.start.hour=20
hbase.offpeak.end.hour=6

Leverage Off‑Peak Resources

Adjust max size for off‑peak to 5 GB and ratio to 5.0, merging daytime HFiles into larger ones while keeping HStore file count below 16.

hbase.hstore.compaction.max.size.offpeak=5368709120
hbase.hstore.compaction.ratio.offpeak=5.0

Selective Disabling of Major

Automatic major compaction is disabled; manual triggering is applied only to suitable tables (TTL tables, small non‑TTL tables) where data expiration and disk capacity allow.

Deep Business Tuning

For a 580 TB table, manual major compaction reduced HFile count but increased Scan latency by 30 %. Improvements include using row‑prefix bloom filters (ROWPREFIX_FIXED_LENGTH) and setting appropriate time ranges to filter HFiles.

Conclusion

Compaction tuning is not a one‑size‑fits‑all service‑side tweak; it requires deep understanding of workload characteristics and dynamic configuration adjustments. The practices described, based on real‑world production experience with HBase 2.0.2, aim to inspire readers to devise effective compaction strategies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data compaction LSM‑Tree performance tuning HBase Database Optimization

Written by

Huolala Tech

Technology reshapes logistics

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.