Big Data 12 min read

Step-by-Step Guide to Building a Hadoop Cluster on CentOS 6.5

This article provides a comprehensive, hands‑on tutorial for setting up a Hadoop 2.6.4 cluster on a CentOS 6.5 development server, covering SSH password‑less login, user/group creation, DNS configuration, JDK installation, environment variables, Hadoop installation, HDFS and YARN configuration, and troubleshooting native library warnings.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Step-by-Step Guide to Building a Hadoop Cluster on CentOS 6.5

SSH Password‑less Configuration

Generate an SSH key pair on the local machine (if not already created) and copy the public key to the server using ssh-copy-id [email protected]. After entering the password once, subsequent SSH and SCP commands can be run without a password.

New User and Group Creation

Create a dedicated group and user for Hadoop administration:

groupadd dps-hadoop
useradd -d /home/dps-hadoop -g dps-hadoop dps-hadoop

Add the user to the sudoers file with the line dps-hadoop ALL=(ALL) ALL to allow privileged commands.

Local DNS Configuration

Modify /etc/resolv.conf for temporary DNS settings and edit /etc/sysconfig/network-scripts/ifcfg-eth0 (replace eth0 with the actual interface) to set a permanent DNS server, e.g., DNS1=172.20.2.24.

JDK Installation

Download the Oracle JDK RPM (e.g., jdk-8u77-linux-x64.rpm) via scp or wget, rename if necessary, and install with rpm -i jdk-8u77-linux-x64.rpm.

Configure JAVA_HOME

Edit the Hadoop user’s ~/.bashrc and add: export JAVA_HOME="/usr/java/jdk1.8.0_77" Reload the file or start a new session to apply.

Install Hadoop 2.6.4

Download the tarball:

wget http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz

Extract it to the home directory and set HADOOP_HOME in ~/.bashrc:

export HADOOP_HOME="/home/dps-hadoop/hadoop-2.6.4"

Repeat these steps on all slave nodes.

HDFS Configuration

Configure the NameNode in core-site.xml and hdfs-site.xml:

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/dps-hadoop/tmpdata</value>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://master:54000/</value>
  </property>
</configuration>

In hdfs-site.xml set the NameNode and DataNode directories and replication factor:

<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/home/dps-hadoop/namedata</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
</configuration>

Similarly configure the DataNode’s core-site.xml and hdfs-site.xml with appropriate directories.

Start HDFS

Format the NameNode and start the HDFS daemons:

bin/hdfs namenode -format
sbin/start-dfs.sh

Access the HDFS Web UI at http://172.20.2.14:50070/ to verify the cluster status.

Native Library Warning and Troubleshooting

If a warning about the native Hadoop library appears, ensure the correct java.library.path is set, the native library matches the system architecture, and the required GLIBC version is available. Adjust log4j.properties to increase debug level if needed.

YARN Configuration

Set the ResourceManager hostname in yarn-site.xml:

<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>master</value>
  </property>
</configuration>

Start YARN with .sbin/start-yarn.sh. The MapReduce JobHistoryServer can be started or stopped as needed.

Cluster Management Web UI

Default ports for monitoring:

HDFS: http://master:50070/ ResourceManager: http://master:8088/ JobHistory:

http://master:19888/
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataYARNHDFSHadoopCluster SetupCentOSSSH
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.