Databases 8 min read

HBase Overview and Step‑by‑Step Installation Guide

This article introduces HBase’s column‑oriented architecture, explains the roles of Master, RegionServer, and Zookeeper, and provides detailed environment preparation and installation commands for setting up an HBase cluster on Hadoop.

Practical DevOps Architecture

Nov 6, 2020

HBase Overview and Step‑by‑Step Installation Guide

1. Introduction

HBase is an open‑source, distributed, column‑oriented database that differs from traditional relational databases and is suited for storing unstructured data. It uses a column‑based model similar to Google Bigtable, where rows have a key and an arbitrary number of columns grouped into column families, each stored in HFiles for efficient caching. Data is sorted by primary key and the table is split into multiple Regions.

In production, HBase runs on top of HDFS, which provides the underlying storage. HBase offers a Java API for applications to access stored data. The cluster consists of a Master, RegionServers, and Zookeeper, as illustrated in the diagram below.

Brief description of the main HBase components:

Master

The HBase Master coordinates RegionServers, monitors their status, balances load, and assigns Regions to servers. Multiple Masters can exist for high availability, but only one is active at a time; Zookeeper helps manage failover.

Region Server

A RegionServer hosts multiple Regions, handling read/write operations for the tables it manages. Clients connect directly to RegionServers to retrieve data. Each Region stores the actual HBase data and is the fundamental unit of distribution and availability.

Zookeeper

Zookeeper is critical for HBase, providing high‑availability for the Master and handling registration of Regions and RegionServers. It has become the standard fault‑tolerance framework for many distributed big‑data projects.

Since an HBase cluster depends on a Hadoop cluster, the Hadoop environment must be set up and compatible before deploying HBase.

2. Environment Preparation

(1) Modify hostnames, update /etc/hosts, and disable firewalls on each server.

[root@c7001 ~]# cat >> /etc/hosts << EOF
192.168.16.135  c7001
192.168.16.80   c7002
192.168.16.95   c7003
192.168.16.97   c7004
192.168.16.101  c7005
EOF

(2) Configure password‑less SSH from c7001 to the other nodes.

ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub c7001
ssh-copy-id -i ~/.ssh/id_rsa.pub c7002
ssh-copy-id -i ~/.ssh/id_rsa.pub c7003
ssh-copy-id -i ~/.ssh/id_rsa.pub c7004
ssh-copy-id -i ~/.ssh/id_rsa.pub c7005

(3) Install JDK 1.8+ on each server.

[root@c7001 ~]# tar zxf jdk-8u171-linux-x64.tar.gz -C /opt/
[root@c7001 opt]# mv jdk1.8.0_171/ jdk1.8
[root@c7001 opt]# vim /etc/profile
export JAVA_HOME=/opt/jdk1.8
export PATH=$PATH:$JAVA_HOME/bin
[root@c7001 ~] source /etc/profile
[root@c7001 opt]# java -version
java version "1.8.0_171"

3. Install HBase.

[root@c7003 opt]# tar zxf /usr/src/hbase-1.3.0-bin.tar.gz -C /opt/

Modify configuration files.

[root@c7003 hbase-1.3.0]# vim conf/hbase-env.sh
# Set JDK path
export JAVA_HOME=/opt/jdk1.8.0_121
# Disable the embedded Zookeeper and use an external cluster
export HBASE_MANAGES_ZK=false

Edit hbase-site.xml to add the following settings:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://c7001:9000/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>c7003,c7004,c7005</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/opt/hbase-1.3.0/tmp/zk/data</value>
  </property>
</configuration>
vi regionservers
# Add the following lines:
c7004
c7005

Copy the HBase installation to the other machines.

[root@c7003 opt]$ scp -r hbase-1.3.0 root@c7004:/opt/
[root@c7003 opt]$ scp -r hbase-1.3.0 root@c7005:/opt/

Start the cluster. [root@c7003 hbase-1.3.0]$ bin/start-hbase.sh Web UI is accessible at http://<IP>:16010.

Process list for each node (the picture is duplicated).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data database HBase cluster Installation NoSQL

Written by

Practical DevOps Architecture

Hands‑on DevOps operations using Docker, K8s, Jenkins, and Ansible—empowering ops professionals to grow together through sharing, discussion, knowledge consolidation, and continuous improvement.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.