HBase 2015 Technical Developments
An overview of HBase’s 2015 milestones—including the stable 1.0 release, clearer API with BufferedMutator, multi‑region replicas for high availability, family‑level flush improvements, RPC call‑queue separation, online configuration changes, and various Q&A on performance, replication, and best‑practice design considerations.
In 2015 HBase reached a milestone with the release of version 1.0, marking its transition to a stable, production‑ready platform.
New Interface – The older HBase client API differed from JDBC; the new API introduces a clearer Connection management model and the BufferedMutator for efficient buffered writes, aligning more closely with JDBC semantics.
Examples of the old and new write APIs are shown (images). The new workflow establishes a connection, obtains a table handle, and performs synchronous or asynchronous writes via BufferedMutator.
Multi‑Region Replicas (HBASE‑10070) – Regions are now replicated across multiple RegionServers, providing read‑availability when the primary replica is down. One primary handles writes to HDFS, while secondary replicas serve reads and may return stale data. Ongoing work synchronizes data from primary to replicas.
Family‑Level Flush (HBASE‑10201) – Previously flushes occurred at the Region level, causing many small files. The new family‑granular flush reduces disk I/O, improves memory usage, and enhances read performance.
RPC Call‑Queue Separation (HBASE‑11355) – Separate call queues for Put, Get, and Scan operations prevent large scans from blocking reads, improving latency. Configuration parameters such as hbase.ipc.server.callqueue.handler.factor, hbase.ipc.server.callqueue.read.ratio, and hbase.ipc.server.callqueue.scan.ratio control the queues.
Online Configuration Adjustment (HBASE‑12147) – Certain settings (e.g., load balancing, compaction) can now be changed without restarting the cluster, thanks to Hadoop’s dynamic configuration loading.
The community’s current focus includes higher availability (e.g., Facebook’s HydraHBase using Raft for 99.999% uptime), better utilization of HDFS tiers (memory, SSD), reducing ZooKeeper dependency, and leveraging off‑heap memory to lower GC pressure.
Q&A Highlights – Topics covered include read/write separation, stale reads from secondary replicas, dynamic configuration without restart, TTL for historical data, MapReduce export optimizations, read/write isolation via snapshots, multitenancy plans, cross‑cluster replication, caching strategies, Phoenix usage, off‑heap bucket cache, and RowKey design to avoid hotspots.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
