Databases 10 min read

HBase 2015 Technical Developments

An overview of HBase’s 2015 milestones—including the stable 1.0 release, clearer API with BufferedMutator, multi‑region replicas for high availability, family‑level flush improvements, RPC call‑queue separation, online configuration changes, and various Q&A on performance, replication, and best‑practice design considerations.

High Availability Architecture
High Availability Architecture
High Availability Architecture
HBase 2015 Technical Developments

In 2015 HBase reached a milestone with the release of version 1.0, marking its transition to a stable, production‑ready platform.

New Interface – The older HBase client API differed from JDBC; the new API introduces a clearer Connection management model and the BufferedMutator for efficient buffered writes, aligning more closely with JDBC semantics.

Examples of the old and new write APIs are shown (images). The new workflow establishes a connection, obtains a table handle, and performs synchronous or asynchronous writes via BufferedMutator.

Multi‑Region Replicas (HBASE‑10070) – Regions are now replicated across multiple RegionServers, providing read‑availability when the primary replica is down. One primary handles writes to HDFS, while secondary replicas serve reads and may return stale data. Ongoing work synchronizes data from primary to replicas.

Family‑Level Flush (HBASE‑10201) – Previously flushes occurred at the Region level, causing many small files. The new family‑granular flush reduces disk I/O, improves memory usage, and enhances read performance.

RPC Call‑Queue Separation (HBASE‑11355) – Separate call queues for Put, Get, and Scan operations prevent large scans from blocking reads, improving latency. Configuration parameters such as hbase.ipc.server.callqueue.handler.factor, hbase.ipc.server.callqueue.read.ratio, and hbase.ipc.server.callqueue.scan.ratio control the queues.

Online Configuration Adjustment (HBASE‑12147) – Certain settings (e.g., load balancing, compaction) can now be changed without restarting the cluster, thanks to Hadoop’s dynamic configuration loading.

The community’s current focus includes higher availability (e.g., Facebook’s HydraHBase using Raft for 99.999% uptime), better utilization of HDFS tiers (memory, SSD), reducing ZooKeeper dependency, and leveraging off‑heap memory to lower GC pressure.

Q&A Highlights – Topics covered include read/write separation, stale reads from secondary replicas, dynamic configuration without restart, TTL for historical data, MapReduce export optimizations, read/write isolation via snapshots, multitenancy plans, cross‑cluster replication, caching strategies, Phoenix usage, off‑heap bucket cache, and RowKey design to avoid hotspots.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databaseHBaseAPI
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.