Big Data 13 min read

Step-by-Step Guide to Build a Hadoop 2.9.2 Cluster on CentOS 7.5

This tutorial walks you through setting up a three‑node Hadoop 2.9.2 cluster on CentOS 7.5, covering environment preparation, password‑less SSH, user creation, JDK installation, Hadoop extraction, configuration file edits, directory setup, ownership changes, service startup, and verification via web UIs.

Open Source Linux
Open Source Linux
Open Source Linux
Step-by-Step Guide to Build a Hadoop 2.9.2 Cluster on CentOS 7.5

Experiment Environment

Three machines are used:

qll251 – 192.168.1.251 – NameNode

qll252 – 192.168.1.252 – DataNode1

qll253 – 192.168.1.253 – DataNode2

Required packages:

hadoop-2.9.2.tar.gz

jdk-8u241-linux-x64.tar.gz

Download links:

Hadoop: https://hadoop.apache.org/releases.html

JDK: https://www.oracle.com/java/technologies/javase-jdk8-downloads.html

Step 1 – Configure password‑less SSH on qll251

[root@qll251 ~]# ssh-keygen    // press Enter for all prompts
[root@qll251 ~]# ssh-copy-id [email protected]
[root@qll251 ~]# ssh-copy-id [email protected]
[root@qll251 ~]# ssh-copy-id [email protected]

Step 2 – Update /etc/hosts on all nodes

On qll251:

# scp /etc/hosts [email protected]:/etc
# scp /etc/hosts [email protected]:/etc

Note: Do not map the hostnames to 127.0.0.1, otherwise DataNodes cannot reach the NameNode.

Step 3 – Create a common Hadoop user

useradd -u 8000 hadoop
echo 123123 | passwd --stdin hadoop

Step 4 – Install JDK on all nodes

Upload jdk‑8u241‑linux‑x64.tar.gz to /home, then:

# tar -zxvf jdk-8u241-linux-x64.tar.gz -C /usr/local
# echo "export JAVA_HOME=/usr/local/jdk1.8.0_241" >> /etc/profile
# echo "export JAVA_BIN=/usr/local/jdk1.8.0_241/bin" >> /etc/profile
# echo "export PATH=\${JAVA_HOME}/bin:\$PATH" >> /etc/profile
# echo "export CLASSPATH=.:\${JAVA_HOME}/lib/dt.jar:\${JAVA_HOME}/lib/tools.jar" >> /etc/profile
# source /etc/profile   // verify with java -version

Step 5 – Deploy Hadoop on the master node

Extract Hadoop and create working directories:

# tar -zxf hadoop-2.9.2.tar.gz -C /home/hadoop/
# mkdir -p /home/hadoop/tmp /home/hadoop/dfs/{name,data}

Step 6 – Configure Hadoop files (located in /home/hadoop/hadoop-2.9.2/etc/hadoop)

Seven configuration files must be edited. Example snippets:

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://qll251:9000</value>
</property>

<property>
  <name>io.file.buffer.size</name>
  <value>131072</value>
</property>

<property>
  <name>hadoop.tmp.dir</name>
  <value>file:/home/hadoop/tmp</value>
  <description>Base for other temporary directories.</description>
</property>

Key files to modify:

hadoop-env.sh – set JAVA_HOME

yarn-env.sh – set JAVA_HOME for YARN

slaves – list DataNode hostnames

core-site.xml – define fs.defaultFS and hadoop.tmp.dir

hdfs-site.xml – configure namenode and datanode directories, replication factor

mapred-site.xml – set mapreduce.framework.name to yarn and job history addresses

yarn-site.xml – configure resource manager addresses and aux services

Step 7 – Adjust ownership

# chown -R hadoop.hadoop /home/hadoop

Step 8 – Enable password‑less SSH for the hadoop user

# su - hadoop
$ ssh-keygen
$ ssh-copy-id hadoop@qll251
$ ssh-copy-id hadoop@qll252
$ ssh-copy-id hadoop@qll253

Step 9 – Copy Hadoop installation to DataNodes

# su - hadoop
$ scp -r /home/hadoop/hadoop-2.9.2/ hadoop@qll252:~/
$ scp -r /home/hadoop/hadoop-2.9.2/ hadoop@qll253:~/

Step 10 – Start Hadoop services on qll251

Format the NameNode (run once): # hdfs namenode -format Start HDFS and YARN:

# /home/hadoop/hadoop-2.9.2/sbin/start-dfs.sh
# /home/hadoop/hadoop-2.9.2/sbin/start-yarn.sh

Alternatively, use the combined script:

# /home/hadoop/hadoop-2.9.2/sbin/start-all.sh
# /home/hadoop/hadoop-2.9.2/sbin/stop-all.sh

Start the History Server:

# mapred --daemon start historyserver

Step 11 – Verify the cluster

Check HDFS status with hdfs dfsadmin -report Web UI for HDFS: http://192.168.1.251:50070

ResourceManager UI: http://192.168.1.251:8088

History Server UI: http://192.168.1.251:19888

Conclusion

The article demonstrates a complete end‑to‑end installation of Apache Hadoop 2.x, illustrating the architecture, module interactions, and essential configuration details; the goal is to learn Hadoop through hands‑on deployment rather than merely installing it.

Experiment environment diagram
Experiment environment diagram
Hostnames and IPs
Hostnames and IPs
Hadoop download page
Hadoop download page
Hadoop configuration files overview
Hadoop configuration files overview
hadoop-env.sh
hadoop-env.sh
yarn-env.sh
yarn-env.sh
core-site.xml
core-site.xml
hdfs-site.xml
hdfs-site.xml
mapred-site.xml
mapred-site.xml
yarn-site.xml
yarn-site.xml
Ownership change
Ownership change
java -version output
java -version output
Hadoop directory layout
Hadoop directory layout
HDFS directories
HDFS directories
HDFS UI
HDFS UI
YARN UI
YARN UI
History Server UI
History Server UI
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataLinuxYARNHDFSHadoopCluster SetupCentOS
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.