Step-by-Step Guide to Build a Hadoop 2.9.2 Cluster on CentOS 7.5
This tutorial walks you through setting up a three‑node Hadoop 2.9.2 cluster on CentOS 7.5, covering environment preparation, password‑less SSH, user creation, JDK installation, Hadoop extraction, configuration file edits, directory setup, ownership changes, service startup, and verification via web UIs.
Experiment Environment
Three machines are used:
qll251 – 192.168.1.251 – NameNode
qll252 – 192.168.1.252 – DataNode1
qll253 – 192.168.1.253 – DataNode2
Required packages:
hadoop-2.9.2.tar.gz
jdk-8u241-linux-x64.tar.gz
Download links:
Hadoop: https://hadoop.apache.org/releases.html
JDK: https://www.oracle.com/java/technologies/javase-jdk8-downloads.html
Step 1 – Configure password‑less SSH on qll251
[root@qll251 ~]# ssh-keygen // press Enter for all prompts
[root@qll251 ~]# ssh-copy-id [email protected]
[root@qll251 ~]# ssh-copy-id [email protected]
[root@qll251 ~]# ssh-copy-id [email protected]Step 2 – Update /etc/hosts on all nodes
On qll251:
# scp /etc/hosts [email protected]:/etc
# scp /etc/hosts [email protected]:/etcNote: Do not map the hostnames to 127.0.0.1, otherwise DataNodes cannot reach the NameNode.
Step 3 – Create a common Hadoop user
useradd -u 8000 hadoop
echo 123123 | passwd --stdin hadoopStep 4 – Install JDK on all nodes
Upload jdk‑8u241‑linux‑x64.tar.gz to /home, then:
# tar -zxvf jdk-8u241-linux-x64.tar.gz -C /usr/local
# echo "export JAVA_HOME=/usr/local/jdk1.8.0_241" >> /etc/profile
# echo "export JAVA_BIN=/usr/local/jdk1.8.0_241/bin" >> /etc/profile
# echo "export PATH=\${JAVA_HOME}/bin:\$PATH" >> /etc/profile
# echo "export CLASSPATH=.:\${JAVA_HOME}/lib/dt.jar:\${JAVA_HOME}/lib/tools.jar" >> /etc/profile
# source /etc/profile // verify with java -versionStep 5 – Deploy Hadoop on the master node
Extract Hadoop and create working directories:
# tar -zxf hadoop-2.9.2.tar.gz -C /home/hadoop/
# mkdir -p /home/hadoop/tmp /home/hadoop/dfs/{name,data}Step 6 – Configure Hadoop files (located in /home/hadoop/hadoop-2.9.2/etc/hadoop)
Seven configuration files must be edited. Example snippets:
<property>
<name>fs.defaultFS</name>
<value>hdfs://qll251:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/tmp</value>
<description>Base for other temporary directories.</description>
</property>Key files to modify:
hadoop-env.sh – set JAVA_HOME
yarn-env.sh – set JAVA_HOME for YARN
slaves – list DataNode hostnames
core-site.xml – define fs.defaultFS and hadoop.tmp.dir
hdfs-site.xml – configure namenode and datanode directories, replication factor
mapred-site.xml – set mapreduce.framework.name to yarn and job history addresses
yarn-site.xml – configure resource manager addresses and aux services
Step 7 – Adjust ownership
# chown -R hadoop.hadoop /home/hadoopStep 8 – Enable password‑less SSH for the hadoop user
# su - hadoop
$ ssh-keygen
$ ssh-copy-id hadoop@qll251
$ ssh-copy-id hadoop@qll252
$ ssh-copy-id hadoop@qll253Step 9 – Copy Hadoop installation to DataNodes
# su - hadoop
$ scp -r /home/hadoop/hadoop-2.9.2/ hadoop@qll252:~/
$ scp -r /home/hadoop/hadoop-2.9.2/ hadoop@qll253:~/Step 10 – Start Hadoop services on qll251
Format the NameNode (run once): # hdfs namenode -format Start HDFS and YARN:
# /home/hadoop/hadoop-2.9.2/sbin/start-dfs.sh
# /home/hadoop/hadoop-2.9.2/sbin/start-yarn.shAlternatively, use the combined script:
# /home/hadoop/hadoop-2.9.2/sbin/start-all.sh
# /home/hadoop/hadoop-2.9.2/sbin/stop-all.shStart the History Server:
# mapred --daemon start historyserverStep 11 – Verify the cluster
Check HDFS status with hdfs dfsadmin -report Web UI for HDFS: http://192.168.1.251:50070
ResourceManager UI: http://192.168.1.251:8088
History Server UI: http://192.168.1.251:19888
Conclusion
The article demonstrates a complete end‑to‑end installation of Apache Hadoop 2.x, illustrating the architecture, module interactions, and essential configuration details; the goal is to learn Hadoop through hands‑on deployment rather than merely installing it.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
