Big Data 10 min read

Step-by-Step Guide to Building a Hadoop 2.7.3 Cluster on Three Servers

This tutorial walks you through preparing three Linux servers, configuring password‑less SSH, installing Hadoop 2.7.3, editing core XML files, distributing the installation, starting the services, and verifying HDFS and MapReduce functionality with practical commands and screenshots.

Java High-Performance Architecture

Sep 24, 2016

Step-by-Step Guide to Building a Hadoop 2.7.3 Cluster on Three Servers

Goal

Set up a Hadoop 2.7.3 cluster on three servers, upload files to HDFS, and successfully run a MapReduce example program.

Setup Approach

(1) Prepare infrastructure: three servers named master, slave1, and slave2; configure password‑less SSH and install Java on each.

(2) Install and configure Hadoop on the master, editing the following configuration files:

core-site.xml – core Hadoop settings.

hdfs-site.xml – HDFS settings.

mapred-site.xml – MapReduce settings.

yarn-site.xml – YARN settings.

slaves – list of worker node IPs.

After configuration, copy the Hadoop directory from the master to slave1 and slave2.

Setup Process

Prepare infrastructure (1) Set hostnames in /etc/hosts (replace with your own IPs):

192.168.31.164 master
192.168.31.242 slave1
192.168.31.140 slave2

(2) Enable password‑less SSH on each server:

ssh-keygen

ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@master

ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@slave1

ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@slave2

Note: Java must be installed beforehand.

Install and configure Hadoop (1) Install on master:

cd /home
wget http://apache.fayea.com/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
tar -xzf hadoop-2.7.3.tar.gz
mv hadoop-2.7.3 hadoop
cd hadoop
mkdir tmp hdfs
mkdir hdfs/data hdfs/name

(2) Edit configuration files (examples shown below). core-site.xml :

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://master:9000</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>file:/home/hadoop/tmp</value>
</property>
<property>
  <name>io.file.buffer.size</name>
  <value>131702</value>
</property>

hdfs-site.xml :

<property>
  <name>dfs.namenode.name.dir</name>
  <value>file:/home/hadoop/hdfs/name</value>
</property>
<property>
  <name>dfs.datanode.data.dir</name>
  <value>file:/home/hadoop/hdfs/data</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>2</value>
</property>
... (additional properties omitted for brevity)

mapred-site.xml (copy from template first):

cp mapred-site.xml.template mapred-site.xml

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
... (additional properties)

yarn-site.xml :

<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
... (additional properties)

Update slaves file with:

slave1
slave2

Set hadoop-env.sh JAVA_HOME to your Java installation path.

Distribute Hadoop Copy the Hadoop directory to the workers:

scp -r /home/hadoop slave1:/home
scp -r /home/hadoop slave2:/home

Adjust hadoop-env.sh if JAVA_HOME differs.

Configure environment variables On each node, edit ~/.bashrc and add: export PATH=$PATH:/home/hadoop/bin:/home/hadoop/sbin Then reload: source ~/.bashrc Start Hadoop On the master:

hdfs namenode -format

hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver

Verify with jps on all three servers (screenshot shown).

Access the web UI:

http://<em>master_ip</em>:50070/

http://<em>master_ip</em>:8088/

Testing and Verification

(1) HDFS operations

hdfs dfs -mkdir -p /user/hadoop/input

hdfs dfs -put /home/hadoop/etc/hadoop/kms*.xml /user/hadoop/input

Check the file browser at http://master_ip:50070/ (screenshot).

(2) MapReduce test

hadoop jar /home/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep /user/hadoop/input /user/hadoop/output 'dfs[a-z.]+'

Note: If a job stays in the running state for a long time without progress, check the logs for errors.

The environment is now set up; you should be able to upload files to HDFS and run MapReduce jobs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Linux MapReduce HDFS Hadoop Cluster Setup

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.