Introduction to HBase: Architecture, Data Model, and Operations
This article provides a comprehensive overview of HBase, covering its distributed column‑oriented architecture, data model components, storage mechanisms, read/write processes, WAL lifecycle, MemStore flushing, region splitting and merging, and failure recovery within the Hadoop ecosystem.
HBase Overview
HBase is a distributed, column‑oriented open‑source database that implements Google’s BigTable design, offering high reliability, high performance, and scalability for petabyte‑scale data across thousands of commodity servers, using HDFS as the underlying storage layer.
Unlike traditional relational databases, HBase is optimized for unstructured data, supports a flexible schema, and can scale horizontally as business needs grow.
HBase Features
Massive storage : Handles PB‑level data with low‑millisecond latency.
Column‑oriented storage : Data is stored by column families, allowing many columns per family.
Easy scalability : Scales both compute (RegionServer) and storage (HDFS) independently.
High concurrency : Supports many simultaneous I/O operations with modest latency.
Sparse storage : Empty columns consume no space.
HBase vs. Relational Databases
HBase Data Model
Namespace : Optional grouping of tables.
Table : Consists of one or more column families.
Row : Contains multiple columns organized by column family.
Column Family : A collection of related columns.
Column Qualifier : Defined as Column Family:Column Qualifier , flexible and unlimited per row.
Cell : Stores a versioned value for a column.
Timestamp : Version identifier for cells; defaults to the current system time.
Rowkey : Unique row identifier stored as a byte array and ordered lexicographically.
Physical View
Data is persisted as key‑value pairs; each column family resides in separate files. Cells are versioned with timestamps, and a write‑ahead log (WAL) guarantees durability.
HBase Architecture
The system follows a master‑slave model. A Master node (usually HA with a standby) maintains table metadata, while one or more RegionServer nodes host the actual data, which is stored in HDFS.
Master responsibilities : assign regions to RegionServers, balance load, detect failed RegionServers, handle schema changes.
RegionServer responsibilities : serve assigned regions, process client I/O, perform region split and compaction.
Zookeeper : provides Master HA, monitors RegionServers, stores meta‑data pointers (e.g., hbase:meta ), and manages cluster configuration.
HDFS : underlying distributed file system offering reliable, replicated storage for HBase data.
Client : communicates via HBase RPC, accesses .META. table for region location caching.
RegionServer Internal Structure
WAL : Write‑Ahead Log for durability and recovery.
BlockCache : In‑memory cache for frequently read data (LRU eviction).
Region : Logical data shard defined by start and end rowkeys.
Store : Stores data for a single column family; contains a MemStore and one or more HFiles.
MemStore : Sorted in‑memory buffer for writes before flushing.
HFile : Immutable on‑disk file containing sorted key‑value pairs.
Region Addressing
Clients first query Zookeeper for the hbase:meta table to locate the RegionServer hosting the target region, then cache this information for subsequent accesses.
Read/Write Process
Write Process
Client locates the target RegionServer via region addressing.
Data is written to the WAL and MemStore; an ACK is returned to the client.
When MemStore reaches a size threshold, it is flushed to a StoreFile.
Read Process
Client locates the RegionServer.
Data is first looked up in BlockCache; if missing, MemStore and StoreFile are consulted.
WAL Mechanism
The Write‑Ahead Log ensures that data can be recovered after a crash. Its lifecycle includes creation, rolling (controlled by hbase.regionserver.logroll.period and hbase.regionserver.maxlogs ), expiration (based on sequence IDs and moved to /hbase/oldWALs ), and deletion (governed by hbase.master.logcleaner.ttl and hbase.master.cleaner.interval ).
MemStore Flush
Flushes are triggered by:
Global memory pressure.
MemStore size exceeding hbase.hregion.memstore.flush.size (default 128 MB).
Number of WALs reaching hbase.regionserver.max.logs (default 32).
Flush interval hbase.regionserver.optionalcacheflushinterval (default 1 h).
Manual invocation via HBase shell or Java API.
RegionServer shutdown.
Post‑recovery after a region crash.
Region Split
When a region grows too large, HBase splits it into two child regions (automatic or manual). The process involves creating transition znodes in Zookeeper, preparing split directories, marking the parent region offline, updating the .META. table, and finally opening the child regions.
Region Merge (Compaction)
Two types of compaction exist:
Minor compaction : Merges a few small StoreFiles into a larger one to improve read performance.
Major compaction : Merges all StoreFiles of a region into a single file, discarding deleted rows and expired versions. Controlled by hbase.hregion.majorcompaction and jitter parameters.
RegionServer Failure Recovery
Zookeeper maintains temporary nodes for each RegionServer. If a RegionServer fails to ping within the session timeout, Zookeeper removes its node and notifies the Master, which then initiates data recovery procedures.
For beginners, a follow‑up article will provide a gentler introduction to HBase concepts.
System Architect Go
Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.