Big Data 18 min read

Understanding Hadoop and HBase: Installation, Configuration, and Basic Operations

This guide introduces Hadoop and HBase fundamentals, explains their architectures and advantages, and provides step‑by‑step instructions for setting up a multi‑node Hadoop cluster, configuring core services, installing HBase, and performing basic HBase shell operations.

Architecture Digest
Architecture Digest
Architecture Digest
Understanding Hadoop and HBase: Installation, Configuration, and Basic Operations

1. Hadoop Overview Hadoop is an open‑source Java framework for distributed storage and processing of large data sets across clusters of commodity servers. It consists of four core modules: Hadoop Common, YARN (resource management), HDFS (distributed file system), and MapReduce (parallel processing).

1.2 Hadoop Architecture The modules work together to provide scalable, fault‑tolerant computation, allowing clusters to grow from a single machine to thousands.

1.3 How Hadoop Works A job is submitted with input/output paths, a JAR containing Map and Reduce classes, and configuration parameters. The JobTracker (or ResourceManager) schedules tasks, TaskTrackers (or NodeManagers) execute them, and results are written back to HDFS.

1.4 Advantages of Hadoop Fast development of distributed systems, automatic data distribution, fault tolerance, dynamic node addition/removal, and platform independence due to Java.

1.5 HBase Introduction HBase is Hadoop’s NoSQL database built on top of HDFS and MapReduce, using ZooKeeper for coordination. It stores data in tables split into regions, each managed by RegionServers.

1.6 HBase Architecture Key components include Client, ZooKeeper, Master, RegionServer, HLog (WAL), MemStore, and StoreFile. Data is first written to MemStore, flushed to StoreFiles, and compacted over time.

2. Hadoop Installation A three‑node cluster is defined (hadoop01, hadoop02, hadoop03) with roles for NameNode, DataNode, ResourceManager, etc. Preparation steps include disabling SELinux/firewall, creating a dedicated user, configuring /etc/hosts, synchronizing time, and setting up SSH key‑based trust.

2.4 Install Hadoop

<span>[root@hadoop01 ~]# wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz</span>
<span>[root@hadoop01 ~]# tar -xvf hadoop-3.2.0.tar.gz -C /usr/local/</span>
<span>[root@hadoop01 ~]# chown along.along -R /usr/local/hadoop-3.2.0/</span>
<span>[root@hadoop01 ~]# ln -s /usr/local/hadoop-3.2.0/ /usr/local/hadoop</span>

3. Configure and Start Hadoop Edit hadoop-env.sh to set Java home, configure core-site.xml, hdfs-site.xml, mapred-site.xml, and yarn-site.xml with appropriate directories, replication factor, and ResourceManager hostnames. Populate masters and slaves files, then format the NameNode and start the daemons.

4. HBase Installation and Configuration

<span>[root@hadoop01 ~]# wget https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/1.4.9/hbase-1.4.9-bin.tar.gz</span>
<span>[root@hadoop01 ~]# tar -xvf hbase-1.4.9-bin.tar.gz -C /usr/local/</span>
<span>[root@hadoop01 ~]# chown -R along.along /usr/local/hbase-1.4.9/</span>
<span>[root@hadoop01 ~]# ln -s /usr/local/hbase-1.4.9/ /usr/local/hbase</span>

Configure hbase-env.sh for Java, and edit hbase-site.xml to set hbase.rootdir, enable distributed mode, define ZooKeeper quorum, and specify master and web UI ports.

5. Start HBase Cluster Use the scripts in /usr/local/hadoop/sbin to start HDFS and YARN, then launch HBase with start-hbase.sh. Verify the cluster via the ResourceManager UI (http://hadoop01:8088) and NameNode UI (http://hadoop01:50070).

6. Basic HBase Shell Operations Common commands include creating tables, inserting rows, scanning, counting, disabling/enabling tables, and dropping tables. Example commands:

<span>hbase(main):001:0> create 'demo','id','info'</span>
<span>hbase(main):002:0> put 'demo','example','id:name','along'</span>
<span>hbase(main):003:0> get 'demo','example'</span>
<span>hbase(main):004:0> disable 'demo'</span>
<span>hbase(main):005:0> drop 'demo'</span>

The guide concludes that after following these steps, a functional Hadoop‑HBase environment is ready for big‑data processing and storage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataConfigurationHBaseInstallationHadoop
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.