Databases 29 min read

HBase Common Issues, Optimization Tips, and New Features in HBase 2.0

This article compiles frequently asked HBase questions, troubleshooting steps, performance optimization techniques, configuration guidance, and an overview of new HBase 2.0 features such as off‑heap memory, Procedure v2, In‑Memory Compaction, and MOB support, providing practical solutions for administrators and developers.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
HBase Common Issues, Optimization Tips, and New Features in HBase 2.0

This article collects useful resources and practical solutions for HBase users, starting with two previously published articles: "HBase Performance Optimization Encyclopedia" and "HBase FAQ Collection". It also points to the author's CSDN HBase column for systematic learning.

HBase Common Questions

Q: HBase write speed is slow when inserting 30,000 rows per second with many columns. How to optimize? A: Use bulkload via a MapReduce job to generate HFiles and import them directly, which is much faster.
Q: Large-scale data loss caused the whole cluster to crash and HDFS configuration files disappeared. A: Check if services are exposed to the public network (possible attack), verify HBase configuration and data backup.
Q: In start-hbase.sh script, what does distMode=false mean? A: It indicates a single‑node mode where all services (Zookeeper, HMaster, RegionServer) run in one JVM; in cluster mode they must be started separately.
Q: Is HBase suitable for user‑profile tags with hundreds of columns? A: Yes, HBase can handle hundreds to thousands of columns, though keeping commonly used columns under 100k is recommended.
Q: How to delete HBase data quickly? A: Set TTL for automatic expiration; otherwise schedule delete operations.
Q: How does HBase 2.0 improve GC pressure? A: It introduces full‑link off‑heap for read/write paths, moving most objects off the Java heap and reducing GC pauses.
Q: Bulkload full vs incremental? A: Use snapshot for full load, then bulkload for incremental.

Troubleshooting & Configuration Tips

Common errors and solutions include:

Connection errors: ensure Zookeeper ports are correct, set zookeeper.znode.parent if using a custom root.

JAR not found: package required HBase jars into the job or set HADOOP_CLASSPATH appropriately.

JDK version mismatch: compile with the same JDK version as the cluster (e.g., 1.7 vs 1.8).

HMaster fails to start: run hbase hbck -fixVersionFile or delete and recreate hbase.rootdir in HDFS.

RegionServer offline after disk failure: run hbase hbck -fixAssignments after fixing the disk.

Firewall issues on CentOS: use chkconfig iptables off, service iptables stop, or systemctl stop firewalld.service as appropriate.

LeaseExpiredException during MapReduce: avoid multiple tasks writing to the same file or clean up corrupted .gz files.

HBase 2.0 New Features

HBase 2.x introduces major enhancements:

Procedure v2 redesign for atomic assignment manager and core workflows.

In‑Memory Compaction (BASIC, EAGER) to reduce write amplification and GC pressure.

MOB (Medium Objects) support for storing 100KB‑10MB binary data efficiently.

Full‑link off‑heap read/write paths to move data handling off the Java heap.

Asynchronous client and RPC implementation using Netty.

Procedure

Procedure v2 provides a distributed task‑flow framework guaranteeing that all sub‑tasks either succeed together or roll back, ensuring consistency across ZooKeeper, Meta table, and HDFS.

In‑Memory Compaction

MemStore is split into mutable and immutable segments; BASIC compaction merges segments without data movement, while EAGER also filters expired cells and rewrites memory.

hbase.hregion.compacting.memstore.type=BASIC  # NONE/BASIC/EAGER

Table‑level configuration example:

create 'test', {NAME => 'cf', IN_MEMORY_COMPACTION => 'BASIC'}

MOB

MOB stores large values in separate HFiles, reducing pressure on regular regions. Enable by setting IS_MOB=true and MOB_THRESHOLD (default 100KB).

hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}

Compression policies can be set via MOB_COMPACT_PARTITION_POLICY (daily, weekly, monthly).

Testing Tools

Use org.apache.hadoop.hbase.IntegrationTestIngestWithMOB to benchmark MOB performance:

$ hbase org.apache.hadoop.hbase.IntegrationTestIngestWithMOB -threshold 1024 -minMobDataSize 512 -maxMobDataSize 5120

Overall, the article provides a comprehensive guide to HBase troubleshooting, performance tuning, and the powerful new capabilities introduced in version 2.0.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationHBasebulkloadIn-Memory CompactionMOBOffheapProcedure
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.