Databases 14 min read

HBase Overview, Architecture, Installation, and Basic Shell Operations

This article provides a comprehensive introduction to HBase, covering its origins, key characteristics, architecture components, installation steps, basic shell commands for table management, data structures, read/write processes, and high‑availability configuration within the Hadoop ecosystem.

Architecture Digest

May 4, 2020

HBase Overview, Architecture, Installation, and Basic Shell Operations

1. HBase Overview

HBase, inspired by Google’s BigTable paper, is a Hadoop sub‑project that offers a high‑reliability, high‑performance, column‑oriented, non‑relational distributed storage system built on HDFS, capable of handling massive rows and columns on commodity hardware.

Key Features

A. Massive storage B. Column‑family storage C. Easy horizontal scalability via RegionServers and HDFS D. Sparse storage that does not allocate space for empty columns.

2. HBase Architecture

HBase consists of HMaster (high‑availability managed by ZooKeeper) and multiple HRegionServers (analogous to HDFS DataNodes). Core components include:

ZooKeeper : ensures a single active master, monitors RegionServers, stores metadata, and handles failover.

HMaster : assigns Regions, balances load, maintains metadata, and coordinates recovery.

HRegionServer : processes client read/write requests, manages assigned Regions, interacts with HDFS, and handles StoreFile operations.

HDFS : provides underlying distributed storage with replication.

HLog : write‑ahead log stored on HDFS for durability.

Region : logical table partition, similar to a MySQL table.

Store : stores data for a column family.

MemStore : in‑memory buffer for writes.

StoreFile and HFile : on‑disk files storing flushed MemStore data.

3. HBase Installation

Installation steps (often performed via Ambari) include installing ZooKeeper, HDFS, and HBase, then configuring files such as hbase-env.sh, hbase-site.xml, and region server lists. Example snippets:

export JAVA_HOME=/usr/lib/jvm/java

export HBASE_MANAGES_ZK=false

<property>
  <name>hbase.rootdir</name>
  <value>/apps/hbase/data</value>
</property>

Additional properties set the cluster mode, master ports, ZooKeeper quorum, and namespace configuration.

4. Basic HBase Shell Operations

Typical commands:

/usr/hdp/current/hbase-client/bin/hbase shell

list

create "student","info"

put "student","1001","info:name","laowang"

scan "student"

Other operations include row‑range scans, retrieving specific columns, updating data, viewing table schema, modifying column‑family versions, deleting rows or tables, counting rows, and managing namespaces.

5. HBase Data Model

Key elements:

RowKey : primary key used for direct, range, or full‑table scans; design impacts data locality.

Column Family : group of columns defined at table creation; each column name is prefixed by its family.

Cell : combination of RowKey, column family, column qualifier, and version; stores raw bytes.

Timestamp : version identifier for a cell, allowing multiple historical values.

6. Read/Write Process

Read flow: client locates the meta table via ZooKeeper, retrieves region locations, checks MemStore, then block cache, and finally HDFS if needed.

Write flow: client obtains meta location, writes to HLog, then to MemStore; flushing to disk occurs when MemStore reaches 40% of heap, after one hour, or when a region’s MemStore exceeds 128 MB. StoreFiles are periodically merged (default every 7 days or when >3 files exist).

7. High Availability

HMaster runs in active‑standby mode coordinated by ZooKeeper, ensuring continuous service even if the active master fails.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data HBase NoSQL Hadoop

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.