Guide to Setting Up Hadoop High Availability (HA) Cluster with HDFS and YARN
This article provides a step‑by‑step tutorial on configuring Hadoop high availability, covering HDFS HA architecture, Quorum Journal Manager synchronization, NameNode failover, YARN HA, required pre‑conditions, cluster planning, configuration files, service startup, and verification procedures.
Guide to Hadoop High Availability Setup
Big data remains a hot field, and this guide walks through building a highly available Hadoop cluster using HDFS and YARN. The source material originates from the GitHub repository BigData-Notes by author heibaiying.
1. High Availability Overview
Hadoop HA consists of HDFS HA and YARN HA. HDFS HA is more complex because the NameNode must guarantee data consistency. The architecture includes Active and Standby NameNodes, a ZKFailoverController, a Zookeeper ensemble, shared storage (QJM or NFS), and DataNode reporting.
1.1 Overall HA Architecture
The HDFS HA architecture is illustrated below (image omitted). Key components:
Active NameNode and Standby NameNode – one serves read/write, the other is standby.
ZKFailoverController – runs as a separate process, monitors NameNode health via HAServiceProtocol RPC, and triggers failover using Zookeeper.
Zookeeper ensemble – provides leader election for failover.
Shared storage system – stores the EditLog and metadata; both NameNodes synchronize through it.
DataNode – reports block locations to both NameNodes.
1.2 Data Synchronization with QJM
Hadoop can use Quorum Journal Manager (QJM) or NFS as shared storage. With QJM, the Active NameNode writes EditLog to a JournalNode cluster, and the Standby NameNode periodically syncs it. The write succeeds when a majority of JournalNodes acknowledge, so at least three JournalNodes are required (odd number to tolerate failures).
1.3 NameNode Failover Process
The failover workflow includes:
HealthMonitor initializes and periodically calls HAServiceProtocol to check NameNode health.
On health change, HealthMonitor notifies ZKFailoverController.
ZKFailoverController triggers ActiveStandbyElector for automatic election.
ActiveStandbyElector interacts with Zookeeper to elect the new active.
After election, ActiveStandbyElector informs ZKFailoverController of the new role.
ZKFailoverController calls HAServiceProtocol to switch the NameNode to Active or Standby.
1.4 YARN High Availability
YARN ResourceManager HA works similarly but stores its state directly in Zookeeper, eliminating the need for extensive metadata replication.
2. Cluster Planning
To meet HA requirements, the cluster needs at least two NameNodes (active/standby), two ResourceManagers, and three JournalNodes. The plan uses three physical hosts (hadoop001, hadoop002, hadoop003).
3. Prerequisites
All servers must have JDK installed.
Zookeeper ensemble must be set up.
SSH password‑less login must be configured between all servers.
4. Cluster Configuration
4.1 Download and Extract
Download a CDH version of Hadoop, e.g., from http://archive.cloudera.com/cdh5/cdh/5/ and extract:
# tar -zvxf hadoop-2.6.0-cdh5.15.2.tar.gz4.2 Set Environment Variables
Edit /etc/profile and add:
export HADOOP_HOME=/usr/app/hadoop-2.6.0-cdh5.15.2
export PATH=${HADOOP_HOME}/bin:$PATHApply the changes with source /etc/profile .
4.3 Modify Configuration Files
Update the following XML files under ${HADOOP_HOME}/etc/hadoop :
hadoop-env.sh – set export JAVA_HOME=/usr/java/jdk1.8.0_201/
core-site.xml – define fs.defaultFS and hadoop.tmp.dir .
hdfs-site.xml – configure replication, NameNode directories, HA nameservice, shared edits directory (QJM), fencing methods, and automatic failover.
yarn-site.xml – enable auxiliary services, log aggregation, enable RM HA, set cluster ID, RM IDs, hostnames, web UI addresses, Zookeeper address, and recovery store class.
mapred-site.xml – set mapreduce.framework.name=yarn .
slaves – list hostnames/IPs of all DataNode/NodeManager hosts (one per line).
4.4 Distribute Packages
Copy the Hadoop installation directory to the other two servers:
# scp -r /usr/app/hadoop-2.6.0-cdh5.15.2/ hadoop002:/usr/app/
# scp -r /usr/app/hadoop-2.6.0-cdh5.15.2/ hadoop003:/usr/app/5. Start the Cluster
5.1 Start Zookeeper
zkServer.sh start5.2 Start JournalNode
hadoop-daemon.sh start journalnode5.3 Initialize NameNode
hdfs namenode -formatCopy the formatted metadata directory to the standby NameNode.
scp -r /home/hadoop/namenode/data hadoop002:/home/hadoop/namenode/5.4 Initialize HA State in Zookeeper
hdfs zkfc -formatZK5.5 Start HDFS
start-dfs.sh5.6 Start YARN
start-yarn.shIf the third ResourceManager is not started automatically, launch it manually:
yarn-daemon.sh start resourcemanager6. Verify the Cluster
6.1 Check Processes
Run jps on each host; you should see NameNode, DataNode, JournalNode, ZKFailoverController, ResourceManager, NodeManager, etc.
6.2 Check Web UI
HDFS UI is available at http:// :50070 and YARN UI at http:// :8088 . The active NameNode and ResourceManager appear in green, while standby instances are shown as standby.
7. Restarting the Cluster
After the initial setup, subsequent restarts are simpler. Ensure Zookeeper is running, then start HDFS with start-dfs.sh , start YARN with start-yarn.sh , and manually start any missing ResourceManager as needed.
References
HDFS High Availability Using the Quorum Journal Manager – Apache Hadoop Docs
ResourceManager High Availability – Apache Hadoop Docs
IBM Developer article on Hadoop NameNode HA – IBM
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.