Big Data 13 min read

Step-by-Step Guide to Build a Hadoop 2.9.2 Cluster on CentOS 7.5

This tutorial walks you through setting up a three‑node Hadoop 2.9.2 cluster on CentOS 7.5, covering environment preparation, password‑less SSH, user creation, JDK installation, Hadoop extraction, configuration file edits, directory setup, ownership changes, service startup, and verification via web UIs.

Open Source Linux

Mar 12, 2020

Step-by-Step Guide to Build a Hadoop 2.9.2 Cluster on CentOS 7.5

Experiment Environment

Three machines are used:

qll251 – 192.168.1.251 – NameNode

qll252 – 192.168.1.252 – DataNode1

qll253 – 192.168.1.253 – DataNode2

Required packages:

hadoop-2.9.2.tar.gz

jdk-8u241-linux-x64.tar.gz

Download links:

Hadoop: https://hadoop.apache.org/releases.html

JDK: https://www.oracle.com/java/technologies/javase-jdk8-downloads.html

Step 1 – Configure password‑less SSH on qll251

[root@qll251 ~]# ssh-keygen    // press Enter for all prompts
[root@qll251 ~]# ssh-copy-id [email protected]
[root@qll251 ~]# ssh-copy-id [email protected]
[root@qll251 ~]# ssh-copy-id [email protected]

Step 2 – Update /etc/hosts on all nodes

On qll251:

# scp /etc/hosts [email protected]:/etc
# scp /etc/hosts [email protected]:/etc

Note: Do not map the hostnames to 127.0.0.1, otherwise DataNodes cannot reach the NameNode.

Step 3 – Create a common Hadoop user

useradd -u 8000 hadoop
echo 123123 | passwd --stdin hadoop

Step 4 – Install JDK on all nodes

Upload jdk‑8u241‑linux‑x64.tar.gz to /home, then:

# tar -zxvf jdk-8u241-linux-x64.tar.gz -C /usr/local
# echo "export JAVA_HOME=/usr/local/jdk1.8.0_241" >> /etc/profile
# echo "export JAVA_BIN=/usr/local/jdk1.8.0_241/bin" >> /etc/profile
# echo "export PATH=\${JAVA_HOME}/bin:\$PATH" >> /etc/profile
# echo "export CLASSPATH=.:\${JAVA_HOME}/lib/dt.jar:\${JAVA_HOME}/lib/tools.jar" >> /etc/profile
# source /etc/profile   // verify with java -version

Step 5 – Deploy Hadoop on the master node

Extract Hadoop and create working directories:

# tar -zxf hadoop-2.9.2.tar.gz -C /home/hadoop/
# mkdir -p /home/hadoop/tmp /home/hadoop/dfs/{name,data}

Step 6 – Configure Hadoop files (located in /home/hadoop/hadoop-2.9.2/etc/hadoop)

Seven configuration files must be edited. Example snippets:

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://qll251:9000</value>
</property>

<property>
  <name>io.file.buffer.size</name>
  <value>131072</value>
</property>

<property>
  <name>hadoop.tmp.dir</name>
  <value>file:/home/hadoop/tmp</value>
  <description>Base for other temporary directories.</description>
</property>

Key files to modify:

hadoop-env.sh – set JAVA_HOME

yarn-env.sh – set JAVA_HOME for YARN

slaves – list DataNode hostnames

core-site.xml – define fs.defaultFS and hadoop.tmp.dir

hdfs-site.xml – configure namenode and datanode directories, replication factor

mapred-site.xml – set mapreduce.framework.name to yarn and job history addresses

yarn-site.xml – configure resource manager addresses and aux services

Step 7 – Adjust ownership

# chown -R hadoop.hadoop /home/hadoop

Step 8 – Enable password‑less SSH for the hadoop user

# su - hadoop
$ ssh-keygen
$ ssh-copy-id hadoop@qll251
$ ssh-copy-id hadoop@qll252
$ ssh-copy-id hadoop@qll253

Step 9 – Copy Hadoop installation to DataNodes

# su - hadoop
$ scp -r /home/hadoop/hadoop-2.9.2/ hadoop@qll252:~/
$ scp -r /home/hadoop/hadoop-2.9.2/ hadoop@qll253:~/

Step 10 – Start Hadoop services on qll251

Format the NameNode (run once): # hdfs namenode -format Start HDFS and YARN:

# /home/hadoop/hadoop-2.9.2/sbin/start-dfs.sh
# /home/hadoop/hadoop-2.9.2/sbin/start-yarn.sh

Alternatively, use the combined script:

# /home/hadoop/hadoop-2.9.2/sbin/start-all.sh
# /home/hadoop/hadoop-2.9.2/sbin/stop-all.sh

Start the History Server:

# mapred --daemon start historyserver

Step 11 – Verify the cluster

Check HDFS status with hdfs dfsadmin -report Web UI for HDFS: http://192.168.1.251:50070

ResourceManager UI: http://192.168.1.251:8088

History Server UI: http://192.168.1.251:19888

Conclusion

The article demonstrates a complete end‑to‑end installation of Apache Hadoop 2.x, illustrating the architecture, module interactions, and essential configuration details; the goal is to learn Hadoop through hands‑on deployment rather than merely installing it.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Linux YARN HDFS Hadoop Cluster Setup CentOS

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.