Big Data 9 min read

Step-by-Step Guide to Installing and Configuring Hadoop 2.9.2 Cluster on Three Nodes

This article provides a detailed, step-by-step tutorial for installing Hadoop 2.9.2, configuring environment variables, editing XML configuration files, formatting the NameNode, starting HDFS and YARN services, testing the cluster, and setting up the MapReduce history server on a three‑node Linux environment.

Practical DevOps Architecture

Jan 4, 2022

Step-by-Step Guide to Installing and Configuring Hadoop 2.9.2 Cluster on Three Nodes

Prepare three Linux servers (bigdata11, bigdata12, bigdata13) with Hadoop 2.9.2 and JDK 1.8. Assign bigdata11 as the NameNode, all three as DataNodes, and configure a secondary NameNode as needed.

Extract the Hadoop package to the desired directory: tar -zxvf hadoop-2.9.2.tar.gz -C ../training/ Set environment variables by editing ~/.bash_profile and adding:

export HADOOP_HOME=/root/training/hadoop-2.9.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

Navigate to hadoop-2.9.2/etc/hadoop and edit the following files: hadoop-env.sh: set JAVA_HOME. core-site.xml: add the default filesystem property:

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://bigdata11:9000</value>
</property>

hdfs-site.xml

: configure temporary directory, secondary NameNode address, and replication factor:

<property>
  <name>hadoop.dir.tmp</name>
  <value>/root/training/hadoop-2.9.2/tmp</value>
</property>
<property>
  <name>dfs.namenode.secondary.http-address</name>
  <value>bigdata13:50090</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>3</value>
</property>

slaves

: list the hostnames or IPs of the DataNode servers. mapred-env.sh and yarn-env.sh: set JAVA_HOME. mapred-site.xml: define the framework as YARN:

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>

yarn-site.xml

: set the ResourceManager hostname and auxiliary services:

<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>bigdata13</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>

Ensure the Hadoop installation directory belongs to the root group: chown -R root:root /path/to/hadoop-2.9.2 Copy the Hadoop package to the secondary nodes (bigdata12, bigdata13).

Format the NameNode on the primary server: hadoop namenode -format Start the cluster in single‑node mode by running the following commands from the sbin directory:

Start HDFS on each node:

hadoop-daemon.sh start namenode   # on bigdata11
hadoop-daemon.sh start datanode   # on each node

Start YARN services:

yarn-daemon.sh start resourcemanager   # on bigdata13
yarn-daemon.sh start nodemanager       # on bigdata11 and bigdata12

Web UI addresses: HDFS – http://<bigdata11_ip>:50070, YARN – http://<bigdata13_ip>:8088.

For full cluster startup, execute from the NameNode host:

start-dfs.sh
start-yarn.sh

Test the HDFS functionality by creating a directory, uploading a file, and retrieving it:

hdfs dfs -mkdir -p /test/input
hdfs dfs -put /root/test.txt /test/input
hdfs dfs -get /test/input/test.txt /root/

Run a MapReduce word‑count example:

hdfs dfs -mkdir /wcinput
hdfs dfs -put /root/wc.txt /wcinput
hadoop jar hadoop-mapreduce-examples-2.9.2.jar wordcount /wcinput /wcoutput
hdfs dfs -cat /wcoutput/part-r-00000

Configure the MapReduce History Server by adding to mapreduce-site.xml:

<property>
  <name>mapreduce.jobhistory.address</name>
  <value>bigdata11:10020</value>
</property>
<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>bigdata11:19888</value>
</property>

Start the history server on the configured node: mr-jobhistory-daemon.sh start historyserver Enable log aggregation by adding to yarn-site.xml:

<property>
  <name>yarn.log-aggregation-enable</name>
  <value>true</value>
</property>
<property>
  <name>yarn.log-aggregation.retain-seconds</name>
  <value>604800</value>
</property>

With these steps completed, the Hadoop 2.9.2 cluster is fully installed, configured, and ready for production workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Linux MapReduce YARN Hadoop Cluster Setup

Written by

Practical DevOps Architecture

Hands‑on DevOps operations using Docker, K8s, Jenkins, and Ansible—empowering ops professionals to grow together through sharing, discussion, knowledge consolidation, and continuous improvement.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.