Step-by-Step Guide to Installing and Configuring a Hadoop Cluster on Three Virtual Machines
This article provides a comprehensive, hands‑on tutorial for preparing three VMs, installing JDK and Hadoop, configuring core‑site.xml, hdfs‑site.xml, mapred‑site.xml, yarn‑site.xml, setting environment variables, distributing the package, starting HDFS and YARN, and verifying the cluster via web UI and jps commands.
This guide walks through the complete process of building a Hadoop cluster on three virtual machines (node01, node02, node03) with IPs 192.168.100.100, 192.168.100.110, and 192.168.100.120.
Preparation
Disable firewalls and SELinux, set hostnames, enable password‑less SSH between the master and slaves, edit /etc/hosts, and install JDK 1.8.
service iptables stop
chkconfig iptables stop
ssh-keygen
ssh-copy-id 192.168.100.100
ssh-copy-id 192.168.100.110
ssh-copy-id 192.168.100.120
vi /etc/hosts
export JAVA_HOME=/export/servers/jdk1.8.0_141Upload and Extract Hadoop Package
Create directories for software and extracted files, upload the Hadoop tarball, rename it, and extract.
mkdir -p /export/softwares
mkdir -p /export/servers
cd /export/softwares/
mv hadoop-2.6.0-cdh5.14.0-自己编译后的版本.tar.gz hadoop-2.6.0-cdh5.14.0.tar.gz
tar -zxvf hadoop-2.6.0-cdh5.14.0.tar.gz -C ../servers/Check Native Compression Support
On the first node run:
cd /export/servers/hadoop-2.6.0-cdh5.14.0
bin/hadoop checknativeIf OpenSSL is reported as false, install it:
yum -y install openssl-develModify Configuration Files
All edits are performed on the first node under /export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop.
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node01:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/tempDatas</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>10080</value>
</property>
</configuration>hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node01:50090</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>node01:50070</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/namenodeDatas</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/datanodeDatas</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
</configuration>hadoop-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_141mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.ubertask.enable</name>
<value>true</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node01:19888</value>
</property>
</configuration>yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>slaves
node01
node02
node03Create Data Directories
mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/tempDatas
mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/namenodeDatas
mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/datanodeDatas
mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/dfs/nn/edits
mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/dfs/snn/name
mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/dfs/nn/snn/editsDistribute the Hadoop Package
cd /export/servers/
scp -r hadoop-2.6.0-cdh5.14.0/ node02:$PWD
scp -r hadoop-2.6.0-cdh5.14.0/ node03:$PWDConfigure Environment Variables on All Nodes
vim /etc/profile.d/hadoop.sh
export HADOOP_HOME=/export/servers/hadoop-2.6.0-cdh5.14.0
export PATH=:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
source /etc/profileStart the Cluster
Format HDFS once (only on first setup): bin/hdfs namenode -format Start services individually or via scripts:
1. Start nodes one by one
Hadoop-daemon.sh start namenode
Hadoop-daemon.sh start datanode
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager2. One‑click start HDFS and YARN
sbin/start-dfs.sh # start HDFS
sbin/start-yarn.sh # start YARN3. One‑click start all services
sbin/start-all.shVerify via Browser
HDFS UI: http://192.168.100.100:50070/dfshealth.html#tab-overview YARN UI:
http://192.168.52.100:8088/clusterValidate Cluster Health
jps # check Java processes on each node
hadoop fs -mkdir /abc
hadoop fs -put /opt/a.txt /abc
hadoop fs -ls /abc
hadoop fs -get /abc/a.txt /optIf the web pages load correctly and the above commands succeed, the Hadoop cluster is operational.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
