Databases 17 min read

HBase Architecture Components and Operational Overview

This article provides a comprehensive overview of HBase's architecture, detailing the roles of RegionServers, HMaster, ZooKeeper, Regions, the META table, write and read paths, compaction processes, region splitting, load balancing, HDFS replication, crash recovery, and the system's advantages and challenges.

Architecture Digest

Apr 28, 2018

HBase Architecture Components

From a physical standpoint, HBase follows a master‑slave architecture consisting of three server types: RegionServers handle data reads and writes and communicate directly with clients; the HBase Master (HMaster) manages region assignment and DDL operations; ZooKeeper, part of HDFS, maintains cluster state and coordinates servers.

Regions

HBase tables are horizontally split into Regions based on rowkey ranges; each Region contains rows between a start key and an end key. RegionServers host these Regions, typically serving about 1,000 Regions each.

HBase HMaster

The HMaster coordinates RegionServers, assigns Regions at startup, re‑balances load, monitors RegionServer health via ZooKeeper, and provides administrative APIs for creating, deleting, and updating tables.

ZooKeeper: The Coordinator

ZooKeeper acts as a distributed coordination service, maintaining an active list of servers, providing failure notifications, and using consensus to ensure a single active HMaster among three to five ZooKeeper nodes.

How the Components Work Together

RegionServers and the active HMaster maintain sessions with ZooKeeper via heartbeat‑driven temporary nodes. The HMaster watches these nodes to discover available RegionServers and to detect failures, while ZooKeeper elects the first HMaster to become the sole active master.

Base First Read or Write

HBase stores region location metadata in a special META table; ZooKeeper holds the META table’s location. On the first client read/write, the client obtains the META location from ZooKeeper, queries META to find the target RegionServer, caches this information, and then accesses the RegionServer for the actual data.

HBase META Table

The META table records every Region in the cluster, acting like a B‑tree where the key is the region start key and region ID, and the value points to the hosting RegionServer.

Region Server Components

Each RegionServer runs on an HDFS DataNode and includes: (1) WAL – a write‑ahead log stored on the distributed file system for crash recovery; (2) BlockCache – an in‑memory read cache that evicts least‑recently‑used blocks when full; (3) MemStore – an in‑memory write buffer per column family that sorts data before flushing; (4) HFiles – sorted on‑disk files storing Key/Value pairs.

HBase Write Steps (1)

When a client issues a Put, the data is first appended to the WAL on disk, ensuring durability in case of server failure.

HBase Write Steps (2)

After WAL persistence, the data is written to MemStore; the client receives an acknowledgment once the MemStore entry is created.

HBase MemStore

MemStore sorts updates into KeyValues in memory, mirroring the on‑disk representation used by HFiles; each column family has a single MemStore.

HBase Region Flush

When a MemStore accumulates enough data, its sorted KeyValue collection is flushed to HDFS as a new HFile. Each column family may have multiple HFiles, and the flush process also records the highest sequence number for recovery purposes.

HBase HFile

HFiles store sorted Key/Value pairs; when MemStore flushes, the entire sorted set is written sequentially to a new HFile, which provides high write throughput by avoiding random disk seeks.

HBase HFile Structure

HFiles contain a multi‑level index similar to a B+‑tree: keys are stored in ascending order; a block index maps RowKeys to 64 KB data blocks; each block has its own leaf index; intermediate indexes point to leaf blocks; the root index points to intermediate indexes. A trailer at the file end stores metadata, Bloom filters, and time‑range information to skip irrelevant files during reads.

HBase Read Merge

Reading a row merges data from three sources in order: BlockCache (recently read data), MemStore (recent writes), and HFiles (persisted data). If the row is not found in cache or MemStore, the HFile index and Bloom filter are used to locate the appropriate HFile.

Read Amplification

Because a region may contain many HFiles, a read may need to consult multiple files, leading to read amplification.

HBase Minor Compaction

Minor compaction merges a few small HFiles into fewer larger ones, reducing file count and performing a sort‑merge operation.

HBase Major Compaction

Major compaction rewrites all HFiles of a region into a single HFile per column family, discarding deleted or expired cells, improving read performance but generating heavy I/O and network traffic.

Region = Contiguous Keys

Each table is split into one or more regions, each covering a continuous, sorted range of rows defined by start and end keys; the default region size is 1 GB, and a RegionServer can host roughly 1,000 regions.

Region Split

When a region grows too large, it splits into two child regions of roughly equal size; the split is reported to the HMaster, which may relocate one child to another server for load balancing.

Read Load Balancing

After a split, the new regions may initially be served from a remote HDFS node; subsequent major compactions move the data to the local node, improving locality.

HDFS Data Replication

All writes and reads go through the HDFS NameNode; WAL and HFile blocks are automatically replicated (default three copies) across different DataNodes, providing durability.

HBase Crash Recovery

If a RegionServer crashes, ZooKeeper detects the lost heartbeat, notifies the HMaster, which reassigns the failed server’s regions to active servers. The HMaster splits the crashed server’s WAL into per‑region files, distributes them to a new RegionServer, and replays the WAL to rebuild the lost MemStore data.

WAL entries are written sequentially; during recovery, the WAL is read, edits are applied to a new MemStore, and finally flushed to HFiles.

Apache HBase Architecture Benefits

Strong consistency model – once a write returns, all readers see the same value.

Automatic scaling – regions split as data grows, leveraging HDFS for distribution and replication.

Built‑in recovery – write‑ahead logging enables crash recovery.

Integration with Hadoop – straightforward MapReduce processing on HBase data.

Apache HBase Has Problems Too…

WAL replay can be slow.

Unexpected recovery may be delayed.

Major compaction can cause I/O storms.

Source: http://www.uml.org.cn/bigdata/201804131.asp

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database NoSQL

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.