Quantifying HBase Write Path: Disk and Network Costs for High‑Throughput Scenarios
This article analytically breaks down HBase's write pipeline, quantifies disk and network overheads for massive random writes, derives formulas for resource consumption under realistic assumptions, and offers concrete tuning recommendations to optimize throughput and reduce cost.
Overview
HBase, based on Google BigTable, is a highly reliable, high‑performance, scalable distributed storage system. This summary focuses on the write path and provides a quantitative analysis of resource consumption for workloads with a small amount of random reads and massive random writes.
HBase Write Path Overview
Writes are first buffered in the in‑memory MemStore. When the MemStore reaches a configured size, it is flushed asynchronously to an HFile on HDFS. At the same time each write is appended to the Write‑Ahead Log (WAL) to guarantee durability.
Flush & Compaction
After each flush the number of HFiles grows, which can degrade read performance and increase system resource usage (e.g., HDFS block count, file descriptors). Compaction merges multiple HFiles into fewer ones, controlling the file count per region, improving data locality, version handling, and deletion‑marker cleanup. Flush and compaction run in independent threads and do not block each other.
System Overhead Quantitative Analysis
The analysis assumes a write‑heavy, event‑type workload with the following simplifying assumptions:
Rowkeys are uniformly distributed (no hot spots).
Write volume is known and data is pre‑partitioned, keeping region distribution stable.
Random reads are negligible and read latency is not a concern.
No multi‑version data or deletions; compaction does not reduce data size.
The write path does not involve random disk I/O, so random IOPS are not a bottleneck.
Typical SATA disks provide sequential write throughput far exceeding 10 Gbps network bandwidth.
RPC bandwidth overhead is ignored.
System Variables
Data size per row s (bytes)
Peak write TPS T HFile replica count R1 (default 3)
WAL replica count R2 (default 3)
WAL compression ratio Cwal (usually 1)
HFile compression ratio C (≈0.2 for DIFF+LZO)
Flush size F (≈128 MB)
Compaction minimum files CT (default 3)
Data TTL TTL (days)
Per‑node data volume D (TB)
Major compaction period M (days, default 20)
The analysis concentrates on two resource metrics: disk usage and network traffic.
Disk Capacity Quantification
Disk usage is modeled as:
V = TTL × 86400 × T × s × C × R1
Example: s=1000, TTL=365, T=200000, C=0.2, R1=3 yields V≈282 TB. Minor costs (WAL logs, temporary compaction files, snapshots, etc.) are not quantified.
Network Capacity Quantification
Network traffic originates from three independent stages: write path (WAL), flush (HFile write), and compaction (major & minor). Each stage is analyzed separately.
Write Path
Network inbound and outbound for WAL writes:
NInWrite = T × s × Cwal × (R2‑1) + (T × s) NOutWrite = T × s × Cwal × (R2‑1)
With T=200000, s=1000, Cwal=1, R2=3 the traffic is ≈600 MB/s inbound and 400 MB/s outbound.
Flush
Network traffic for moving flushed HFiles to HDFS:
NInFlush = s × T × (R1‑1) × C NOutFlush = s × T × (R1‑1) × C
Using the same parameters and R1=3, C=0.2 yields ≈76 MB/s inbound and outbound.
Major Compaction
Assuming data is locally read (short‑circuit) and only the first replica is written locally, the network cost per second is:
NInMajor = D × (R1‑1) / M NOutMajor = D × (R1‑1) / M
For D=10 TB, R1=3, M=20 the traffic is about 12 MB/s inbound and outbound.
Minor Compaction
The maximum number of minor compactions a row experiences, based on default thresholds, is 6. Network cost per second:
NInMinor = s × T × (R1‑1) × C × 6 NOutMinor = s × T × (R1‑1) × C × 6
Result: ≈458 MB/s inbound and outbound.
Overall Network Summary
Summing all components (write, flush, major, minor):
NInTotal = 572 MB/s + 76.3 MB/s + 12 MB/s + 457.8 MB/s = 1118.1 MB/s NOutTotal = 381 MB/s + 76.3 MB/s + 12 MB/s + 457.8 MB/s = 927.1 MB/sThese figures represent the theoretical minimum under ideal conditions; real‑world traffic can be higher due to uneven partitioning, region splits, low locality, excessive small files, etc.
Practical Optimization Recommendations
Design rowkeys to avoid write hotspots during early adoption.
Increase hbase.hstore.compaction.min to reduce the number of compaction rounds a row undergoes.
Pre‑partition tables based on steady‑state load to minimize region splits.
For latency‑insensitive workloads, set hbase.hstore.compaction.max.size to ~4 GB to avoid large‑file compactions.
If data has TTL and no multi‑versioning, disable periodic major compaction and rely on file expiration.
Compress data before ingestion to lower WAL‑related network traffic (WAL itself cannot be compressed).
Adjust MemStore memory ratio so each region can accumulate a full FlushSize before flushing, producing larger HFiles and reducing subsequent compaction cost.
Conclusion
The analysis provides a formula‑driven evaluation of HBase write‑path resource consumption for high‑throughput scenarios, based on HBase 1.2.6. The quantitative framework helps practitioners size clusters, predict disk and network budgets, and identify effective tuning knobs.
References
Google BigTable – https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf
HBase Official Site – http://hbase.apache.org/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Youzan Coder
Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
