Big Data 9 min read

Step-by-Step Guide to Installing and Configuring Hadoop 2.9.2 Cluster on Three Nodes

This article provides a detailed, step-by-step tutorial for installing Hadoop 2.9.2, configuring environment variables, editing XML configuration files, formatting the NameNode, starting HDFS and YARN services, testing the cluster, and setting up the MapReduce history server on a three‑node Linux environment.

Practical DevOps Architecture
Practical DevOps Architecture
Practical DevOps Architecture
Step-by-Step Guide to Installing and Configuring Hadoop 2.9.2 Cluster on Three Nodes

Prepare three Linux servers (bigdata11, bigdata12, bigdata13) with Hadoop 2.9.2 and JDK 1.8. Assign bigdata11 as the NameNode, all three as DataNodes, and configure a secondary NameNode as needed.

Extract the Hadoop package to the desired directory:

tar -zxvf hadoop-2.9.2.tar.gz -C ../training/

Set environment variables by editing ~/.bash_profile and adding:

export HADOOP_HOME=/root/training/hadoop-2.9.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

Navigate to hadoop-2.9.2/etc/hadoop and edit the following files:

hadoop-env.sh : set JAVA_HOME .

core-site.xml : add the default filesystem property: <property> <name>fs.defaultFS</name> <value>hdfs://bigdata11:9000</value> </property>

hdfs-site.xml : configure temporary directory, secondary NameNode address, and replication factor: <property> <name>hadoop.dir.tmp</name> <value>/root/training/hadoop-2.9.2/tmp</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>bigdata13:50090</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property>

slaves : list the hostnames or IPs of the DataNode servers.

mapred-env.sh and yarn-env.sh : set JAVA_HOME .

mapred-site.xml : define the framework as YARN: <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>

yarn-site.xml : set the ResourceManager hostname and auxiliary services: <property> <name>yarn.resourcemanager.hostname</name> <value>bigdata13</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>

Ensure the Hadoop installation directory belongs to the root group:

chown -R root:root /path/to/hadoop-2.9.2

Copy the Hadoop package to the secondary nodes (bigdata12, bigdata13).

Format the NameNode on the primary server:

hadoop namenode -format

Start the cluster in single‑node mode by running the following commands from the sbin directory:

Start HDFS on each node: hadoop-daemon.sh start namenode # on bigdata11 hadoop-daemon.sh start datanode # on each node

Start YARN services: yarn-daemon.sh start resourcemanager # on bigdata13 yarn-daemon.sh start nodemanager # on bigdata11 and bigdata12

Web UI addresses: HDFS – http:// :50070 , YARN – http:// :8088 .

For full cluster startup, execute from the NameNode host:

start-dfs.sh
start-yarn.sh

Test the HDFS functionality by creating a directory, uploading a file, and retrieving it:

hdfs dfs -mkdir -p /test/input
hdfs dfs -put /root/test.txt /test/input
hdfs dfs -get /test/input/test.txt /root/

Run a MapReduce word‑count example:

hdfs dfs -mkdir /wcinput
hdfs dfs -put /root/wc.txt /wcinput
hadoop jar hadoop-mapreduce-examples-2.9.2.jar wordcount /wcinput /wcoutput
hdfs dfs -cat /wcoutput/part-r-00000

Configure the MapReduce History Server by adding to mapreduce-site.xml :

<property>
  <name>mapreduce.jobhistory.address</name>
  <value>bigdata11:10020</value>
</property>
<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>bigdata11:19888</value>
</property>

Start the history server on the configured node:

mr-jobhistory-daemon.sh start historyserver

Enable log aggregation by adding to yarn-site.xml :

<property>
  <name>yarn.log-aggregation-enable</name>
  <value>true</value>
</property>
<property>
  <name>yarn.log-aggregation.retain-seconds</name>
  <value>604800</value>
</property>

With these steps completed, the Hadoop 2.9.2 cluster is fully installed, configured, and ready for production workloads.

Big DataLinuxMapReduceYARNHadoopCluster Setup
Practical DevOps Architecture
Written by

Practical DevOps Architecture

Hands‑on DevOps operations using Docker, K8s, Jenkins, and Ansible—empowering ops professionals to grow together through sharing, discussion, knowledge consolidation, and continuous improvement.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.