Big Data 14 min read

Step-by-Step Guide to Installing and Configuring a Hadoop Cluster on Three Virtual Machines

This article provides a comprehensive, hands‑on tutorial for preparing three VMs, installing JDK and Hadoop, configuring core‑site.xml, hdfs‑site.xml, mapred‑site.xml, yarn‑site.xml, setting environment variables, distributing the package, starting HDFS and YARN, and verifying the cluster via web UI and jps commands.

Big Data Technology & Architecture

May 6, 2020

Step-by-Step Guide to Installing and Configuring a Hadoop Cluster on Three Virtual Machines

This guide walks through the complete process of building a Hadoop cluster on three virtual machines (node01, node02, node03) with IPs 192.168.100.100, 192.168.100.110, and 192.168.100.120.

Preparation

Disable firewalls and SELinux, set hostnames, enable password‑less SSH between the master and slaves, edit /etc/hosts, and install JDK 1.8.

service iptables stop
chkconfig iptables stop
ssh-keygen
ssh-copy-id 192.168.100.100
ssh-copy-id 192.168.100.110
ssh-copy-id 192.168.100.120
vi /etc/hosts
export JAVA_HOME=/export/servers/jdk1.8.0_141

Upload and Extract Hadoop Package

Create directories for software and extracted files, upload the Hadoop tarball, rename it, and extract.

mkdir -p /export/softwares
mkdir -p /export/servers
cd /export/softwares/
mv hadoop-2.6.0-cdh5.14.0-自己编译后的版本.tar.gz hadoop-2.6.0-cdh5.14.0.tar.gz
tar -zxvf hadoop-2.6.0-cdh5.14.0.tar.gz -C ../servers/

Check Native Compression Support

On the first node run:

cd /export/servers/hadoop-2.6.0-cdh5.14.0
bin/hadoop checknative

If OpenSSL is reported as false, install it:

yum -y install openssl-devel

Modify Configuration Files

All edits are performed on the first node under /export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop.

core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://node01:8020</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/tempDatas</value>
  </property>
  <property>
    <name>io.file.buffer.size</name>
    <value>4096</value>
  </property>
  <property>
    <name>fs.trash.interval</name>
    <value>10080</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>node01:50090</value>
  </property>
  <property>
    <name>dfs.namenode.http-address</name>
    <value>node01:50070</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/namenodeDatas</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/datanodeDatas</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>134217728</value>
  </property>
</configuration>

hadoop-env.sh

export JAVA_HOME=/export/servers/jdk1.8.0_141

mapred-site.xml

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>mapreduce.job.ubertask.enable</name>
    <value>true</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>node01:10020</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>node01:19888</value>
  </property>
</configuration>

yarn-site.xml

<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>node01</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
</configuration>

slaves

node01
node02
node03

Create Data Directories

mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/tempDatas
mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/namenodeDatas
mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/datanodeDatas
mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/dfs/nn/edits
mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/dfs/snn/name
mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/dfs/nn/snn/edits

Distribute the Hadoop Package

cd /export/servers/
scp -r hadoop-2.6.0-cdh5.14.0/ node02:$PWD
scp -r hadoop-2.6.0-cdh5.14.0/ node03:$PWD

Configure Environment Variables on All Nodes

vim /etc/profile.d/hadoop.sh
export HADOOP_HOME=/export/servers/hadoop-2.6.0-cdh5.14.0
export PATH=:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
source /etc/profile

Start the Cluster

Format HDFS once (only on first setup): bin/hdfs namenode -format Start services individually or via scripts:

1. Start nodes one by one

Hadoop-daemon.sh start namenode
Hadoop-daemon.sh start datanode
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager

2. One‑click start HDFS and YARN

sbin/start-dfs.sh   # start HDFS
sbin/start-yarn.sh  # start YARN

3. One‑click start all services

sbin/start-all.sh

Verify via Browser

HDFS UI: http://192.168.100.100:50070/dfshealth.html#tab-overview YARN UI:

http://192.168.52.100:8088/cluster

Validate Cluster Health

jps   # check Java processes on each node
hadoop fs -mkdir /abc
hadoop fs -put /opt/a.txt /abc
hadoop fs -ls /abc
hadoop fs -get /abc/a.txt /opt

If the web pages load correctly and the above commands succeed, the Hadoop cluster is operational.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Linux YARN HDFS Hadoop Cluster Setup

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.