Databases 12 min read

Understanding HBase Compaction: Types, Triggers, Parameters, and Performance Impact

This article explains HBase's compaction mechanism, covering why it is needed, the differences between minor and major compaction, the conditions that trigger compaction, key configuration parameters, thread‑pool handling, compaction policies, and how compaction influences read and write performance in a large‑scale NoSQL database.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Understanding HBase Compaction: Types, Triggers, Parameters, and Performance Impact

HBase stores data using an LSM‑Tree model where writes go to a Write‑Ahead‑Log and then to an in‑memory memstore before being flushed as HFile files; as the number of HFiles grows, query performance degrades due to increased I/O, so compaction merges small HFiles to reduce their count.

Compaction is an I/O‑intensive operation that deliberately sacrifices some I/O to achieve more stable and faster read performance.

There are two compaction types: Minor Compaction merges a set of adjacent small StoreFiles (the underlying HFiles) and removes only TTL‑expired cells, while Major Compaction merges all StoreFiles, cleaning deleted, expired, and over‑versioned cells; major compaction consumes more resources and is often manually triggered during low‑traffic periods.

Compaction can be triggered by three mechanisms: a memstore flush when thresholds are met, a background CompactionChecker thread that periodically evaluates the need for compaction, and manual execution via HBase shell, UI, or API commands.

Key major‑compaction parameters include hbase.hregion.majorcompaction (default 7‑day interval) and hbase.hregion.majorcompaction.jitter (default 0.5) which randomizes the exact timing to avoid simultaneous compactions across region servers.

Minor‑compaction parameters control file selection and include hbase.hstore.compaction.min (minimum number of files, default 3), hbase.hstore.compaction.max (maximum number, default 10), hbase.hstore.compaction.min.size (files smaller than this are auto‑included, default 128 MB), hbase.hstore.compaction.max.size (upper size limit, default Long.MAX_VALUE), hbase.hstore.compaction.ratio (size‑ratio threshold, default 1.2), and hbase.hstore.compaction.ratio.offpeak (off‑peak ratio, default 5.0).

HBase RegionServer uses two dedicated thread pools for compaction: largeCompactions for large‑scale jobs and smallCompactions for minor jobs, each defaulting to a single thread; allocation between them is governed by hbase.regionserver.thread.compaction.throttle, and pool sizes can be tuned via hbase.regionserver.thread.compaction.large and hbase.regionserver.thread.compaction.small.

Four compaction policies exist—RatioBasedCompactionPolicy, ExploringCompactionPolicy, FIFOCompactionPolicy, and StripeCompactionPolicy—with the default in recent HBase versions being ExploringCompactionPolicy, which generally offers better performance than the older RatioBased policy.

Compaction improves read latency by stabilizing the number of files and I/O seeks, but the merging process creates short‑term I/O and bandwidth spikes that can cause noticeable latency spikes for read requests.

Write performance can suffer from write amplification and temporary blocking: when the number of StoreFiles exceeds hbase.hstore.blockingStoreFiles (default 10), flush operations are paused for up to hbase.hstore.blockingWaitTime (default 90 seconds) until compaction reduces the file count.

In summary, HBase compaction is a crucial optimization that trades increased disk I/O for more predictable read performance, with distinct minor and major modes, a rich set of tunable parameters, and measurable impacts on both read and write workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceoptimizationcompactionHBasedatabasesbigdata
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.