How Huolala Solved HBase Bulkload Challenges: A Practical Guide
This article details Huolala’s experience building a unified Hive‑to‑HBase pipeline, addressing low development efficiency, lack of monitoring, and HBase instability by evaluating two architectures, implementing a generic Transform tool, optimizing compaction and DistCp, and establishing stability and data‑validation mechanisms.
Introduction
HBase is a high‑availability, high‑performance NoSQL database built on Hadoop, used at Huolala for online storage supporting risk control, map, real‑time tags, and other critical business scenarios. In production, large amounts of T+1 data need to be generated daily from Hive and imported into HBase.
Problems and Challenges
Initially, each business wrote its own Hive‑to‑HBase Bulkload code, leading to low development efficiency, lack of chain‑link monitoring, and HBase instability during peak Bulkload operations.
Low development efficiency – duplicated effort.
Lack of link assurance – no visibility of failures or delays.
Impact on HBase stability – Bulkload caused online incidents, even cross‑cluster HFile loads that made HBase unavailable for 20 minutes.
Therefore a unified Hive‑to‑HBase tool with stability guarantees was required, with three main requirements: simple and generic, observable with alerts, and controllable through an approval workflow.
Research
Two typical architectures for Hive‑to‑HBase were evaluated.
Solution 1: Spark/MR reads Hive, writes HFiles directly to the online HBase cluster, then LoadIncrementalHFiles loads them.
Advantages and disadvantages are shown in the diagram.
Solution 2: Write HFiles to the offline cluster, then use Hadoop DistCp to copy them to the online HBase cluster.
Because unrestricted copy speed posed stability risks, Solution 2 was chosen.
Implementation
Transform
The unified Transform script provides multi‑RowKey strategies and column‑name mapping, packaged as a template task in the data‑development platform.
Custom RowKey generation (hash, salt, field slicing).
Column name mapping between Hive and HBase.
During gray‑release, two issues were observed:
Compaction peak resource contention – many small HFiles triggered a CPU/IO spike during the next CompactionChecker run.
Low data locality raising P99 latency – DistCp randomly selected DataNodes, reducing locality.
Solutions applied:
Merge tasks to reduce HFile count per Region.
Adjust table‑level compaction settings to avoid compaction after Bulkload.
Run a dedicated Major Compaction tool during off‑peak hours for tables without TTL.
Enhance DistCp to support favored nodes for better locality.
Compaction Tool
After optimizations, HFile count and bulk‑load‑induced compaction spikes were mitigated. Three scheduling strategies were implemented:
OffpeakCompact – runs only in low‑traffic periods with region election.
TimeCompact – triggers based on elapsed time since last major compaction.
FileNumberCompact – triggers when HFile count exceeds a threshold.
Observed effects show controlled HFile numbers and avoided peak‑time compactions.
DistCp Enhancements
To meet strict latency (P99, P999) requirements, DistCp was extended with:
FavoredNodes specification per HFile.
Multi‑cluster copy with simple configuration.
Related JIRA tickets: HADOOP‑18629, HBASE‑27670, HBASE‑27733.
Data Validation
Bulkload quality is verified by counting rows in Transform, comparing HFile directory sizes before/after DistCp, and monitoring Load metrics, plus sampling RowKey queries.
Stability Assurance
A comprehensive data‑link stability solution was built and is being rolled out.
Conclusion
The article shares Huolala’s pain points, design decisions, and practical implementations for a reliable HBase offline data pipeline, offering references for readers.
// 1. Generate HFile writer
HFileOutputFormat2.getNewWriter()
HFileContextBuilder contextBuilder = new HFileContextBuilder()
.withCompression(compression)
.withChecksumType(HStore.getChecksumType(conf))
.withBytesPerCheckSum(HStore.getBytesPerChecksum(conf))
.withBlockSize(blockSize);
// 2. Load HFile split
LoadIncrementalHFiles.copyHFileHalf()
HFileContext hFileContext = new HFileContextBuilder().withCompression(compression)
.withChecksumType(HStore.getChecksumType(conf))
.withBytesPerCheckSum(HStore.getBytesPerChecksum(conf)).withBlockSize(blocksize)
.withDataBlockEncoding(familyDescriptor.getDataBlockEncoding()).withIncludesTags(true)
.build();
halfWriter = new StoreFileWriter.Builder(conf, cacheConf, fs).withFilePath(outFile)
.withBloomType(bloomFilterType).withFileContext(hFileContext).build(); private static final AtomicBoolean offPeakCompactionTracker = new AtomicBoolean();
// Normal case - coprocessor is not overriding file selection.
if (!compaction.hasSelection()) {
boolean isUserCompaction = priority == Store.PRIORITY_USER;
boolean mayUseOffPeak =
offPeakHours.isOffPeakHour() && offPeakCompactionTracker.compareAndSet(false, true);
try {
compaction.select(this.filesCompacting, isUserCompaction, mayUseOffPeak,
forceMajor && filesCompacting.isEmpty());
} catch (IOException e) {
if (mayUseOffPeak) {
offPeakCompactionTracker.set(false);
}
throw e;
}
assert compaction.hasSelection();
if (mayUseOffPeak && !compaction.getRequest().isOffPeak()) {
// Compaction policy doesn't want to take advantage of off-peak.
offPeakCompactionTracker.set(false);
}
}Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
