Databases 7 min read

HBase FAQ: Performance Optimization, Bulk Load, Single‑Node Mode, Transactions, and Best Practices

This article compiles a series of HBase questions and answers covering write performance, bulk loading, single‑node configuration, column scalability, transaction isolation, fast deletion methods, off‑heap optimizations, bulkload modes, Hive integration, direct HFile reads, and region planning.

Big Data Technology & Architecture

Mar 12, 2020

HBase FAQ: Performance Optimization, Bulk Load, Single‑Node Mode, Transactions, and Best Practices

Q: HBase massive writes are slow (200+ columns per family, 30,000 rows/sec) using mutate with a 10 MB client buffer on four 128 GB machines; how to optimize?

A: Use the bulkload approach by generating HFiles via a MapReduce job and importing them directly, which provides much higher throughput.

Q: Large‑scale data loss caused the whole HBase cluster to crash, with errors about missing HDFS config files like hbase.version; has anyone encountered this on a self‑built cluster?

A: Verify that service ports are not exposed to the public internet to avoid attacks, review HBase configuration, and ensure proper data backup procedures.

Q: In start-hbase.sh, when distMode is false it seems to start only the master; does single‑node mode omit Zookeeper and RegionServer, or do they run in the same JVM?

A: In single‑node mode all services (Zookeeper, HMaster, RegionServer) run within a single JVM process using the local filesystem; other modes require separate processes for each service.

Q: Is HBase suitable for large‑scale user profile tagging with around a hundred columns?

A: HBase handles scenarios with hundreds to thousands of columns and can even support millions, though it is recommended to keep frequently used columns under 100 k.

Q: How mature are transactions in HBase 2? What isolation level is supported and what does the distributed transaction rely on?

A: Transactions are currently limited to the region level; cross‑row transactions are possible but only within a single region.

Q: What is the fastest way to bulk delete data in HBase?

A: Set a TTL (time‑to‑live) on the data; if that is not feasible, schedule delete API calls, which also perform well.

Q: How does HBase 2.0 improve query performance?

A: HBase 2.0 introduces full‑link off‑heap memory management to reduce on‑heap garbage collection pressure. Write‑path optimizations include reading KeyValues directly into off‑heap ByteBuffers at the RPC layer, using an off‑heap MSLAB pool, and employing off‑heap‑compatible Protobuf (3.0+). Read‑path optimizations involve reference‑counted BucketCache to avoid copies, using ByteBuffer‑based KeyValue implementations, and further BucketCache performance tweaks.

Q: Does HBase bulkload support full and incremental loads?

A: Yes; a snapshot provides a full load, while bulkload is used for incremental loads.

Q: What performance issues arise when using Hive on HBase for analyzing over a billion rows?

A: Hive can query HBase data via SQL‑like syntax, but the performance is generally slower compared to native HBase access.

Q: How much performance gain is achieved by reading HFiles directly versus using the HBase client?

A: Scanning the entire table with Spark reading HFiles can yield more than a two‑fold speed improvement and does not impact other HBase read/write operations.

Q: How should the number of HBase regions be partitioned?

A: Ideally set the number of regions to a multiple of the number of RegionServers so they are evenly distributed; ensure rowkeys are well‑distributed. See documentation for details.

Hope this information helps the readers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

HBase transactions bulk load Off-Heap Single Node

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.