HBase Architecture, Components, and Operations Overview
This article provides a comprehensive overview of Apache HBase’s architecture, detailing its core components such as RegionServer, HMaster, ZooKeeper, WAL, MemStore, and HFiles, and explains key processes including read/write paths, compaction, region splitting, load balancing, and recovery mechanisms.
HBase Architecture Overview
Physically, HBase consists of three types of servers in a master‑slave mode: RegionServer, HBase HMaster, and ZooKeeper. RegionServers handle data read/write, HMaster manages region assignment and table creation/deletion, and ZooKeeper maintains cluster state and master election.
Hadoop DataNode stores the actual data files; RegionServers are placed on DataNodes to keep data local. NameNode maintains metadata for HDFS blocks.
Regions
Tables are horizontally split into regions based on row keys; each region contains rows between a start and end key. A RegionServer typically manages about 1,000 regions.
HBase HMaster
HMaster responsibilities include controlling RegionServer work (assigning regions at startup, rebalancing, monitoring via ZooKeeper) and managing tables (create, delete, update).
ZooKeeper
ZooKeeper coordinates distributed state, monitors server liveness, provides notifications for failures, and conducts master election. The cluster should have an odd number of servers for reliable election.
Interaction Between Components
Each RegionServer creates an ephemeral node in ZooKeeper; HMaster watches these nodes to detect failures and to trigger recovery or re‑assignment. Active HMaster sends heartbeats; standby HMaster monitors the active one.
First Read/Write Operations
Clients obtain the META table location from ZooKeeper, query the appropriate RegionServer for the target row key, and then perform read/write operations. Subsequent operations use cached RegionServer addresses unless a server becomes unavailable.
META Table
The META table stores region address information in a B‑tree‑like structure (key: region start key and ID; value: RegionServer).
RegionServer Components
WAL (Write‑Ahead Log) for durability and recovery.
Block Cache for read caching.
MemStore for write buffering; one MemStore per column family.
HFiles stored on HDFS, containing sorted key‑value pairs.
Write Path
Step 1: Client PUT is written to WAL. Step 2: Data is stored in MemStore and the client receives acknowledgment. When MemStore reaches a threshold, its contents are flushed to a new HFile on HDFS.
HFile Structure and Indexing
HFiles contain multi‑level indexes (root, intermediate, leaf) and a meta block with bloom filters and timestamps, enabling efficient reads without scanning the entire file.
Read Path and Read Amplification
Reads first check Block Cache, then MemStore, and finally HFiles (using indexes and bloom filters). Because data may reside in multiple HFiles, read amplification can occur.
Compaction
Minor compaction merges small HFiles into larger ones. Major compaction rewrites all HFiles of a column family into a single file, discarding deleted/expired cells, but incurs heavy I/O and temporary unavailability.
Region Splitting and Load Balancing
When a region grows beyond ~1 GB, it is split into two sub‑regions; HMaster may reassign them to different RegionServers for load balancing. This can cause remote data access until a subsequent major compaction brings data local.
Data Replication and Recovery
HDFS replicates WAL and HFiles across three nodes for reliability. In case of RegionServer failure, ZooKeeper notifies HMaster, which reassigns regions and replays WAL to rebuild MemStore.
Advantages and Disadvantages of HBase
Strong consistency model.
Automatic scaling via region splitting.
Built‑in recovery using WAL.
Good integration with Hadoop/MapReduce.
Drawbacks: slower WAL recovery, complex crash recovery, resource‑intensive major compaction.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
