Operations 13 min read

How We Upgraded a 100‑Node Hadoop Cluster with Ansible and Ambari

This article details the step‑by‑step process of modernizing a large‑scale Hadoop deployment—identifying legacy pain points, evaluating three migration strategies, selecting an in‑place upgrade using Ambari‑managed HDP, and automating the entire workflow with Ansible to minimize downtime and operational risk.

dbaplus Community
dbaplus Community
dbaplus Community
How We Upgraded a 100‑Node Hadoop Cluster with Ansible and Ambari

Background and Pain Points

The company initially built its Hadoop platform using the Apache community version, which grew to over a hundred nodes. Manual configuration changes, node additions/removals, and fragmented monitoring caused high operational overhead, inconsistent component configurations, lack of HA, mixed OS environments, and reliance on /etc/hosts instead of DNS.

Problem Analysis

Key issues included incomplete HA for HDFS, YARN, HiveServer2, mixed deployment of NameNode/JournalNode, unclear process‑to‑machine mapping, missing rack‑level awareness, unmanaged configuration files, and cumbersome restart procedures that required multiple manual SSH hops.

Solution Exploration

Three migration paths were considered:

Half‑automated management with Ansible only – improved efficiency but still error‑prone.

Transition to Cloudera Manager – rejected because the required Hadoop version downgrade conflicted with the existing 2.7.2 release.

Transition to Ambari‑managed HDP – chosen because HDP 2.6.5 closely matched the existing stack and allowed an in‑place upgrade.

Chosen Upgrade Strategy

The team opted for an in‑place upgrade of the Apache cluster to Ambari‑managed HDP, focusing first on HDFS migration, then sequentially upgrading Zookeeper, YARN, and HBase. Two upgrade models were evaluated (in‑place vs. data‑copy), and the in‑place approach was selected for lower risk and faster feedback.

Technical Upgrade Steps

1. Zookeeper Upgrade

Stop Apache ZK and install HDP ZK with a new data directory.

Copy existing ZK data to the HDP directory and synchronize configuration files.

Start HDP ZK and decommission Apache ZK.

Validate the upgrade and ensure rollback capability.

Zookeeper upgrade steps
Zookeeper upgrade steps

2. HDFS Upgrade

Record the current state of the three Ubuntu master nodes.

Stop Apache HDFS and back up metadata.

Prepare new CentOS machines, switch hostnames/IPs, and install HDP HDFS.

Copy Apache HDFS NameNode and JournalNode metadata to the HDP directories.

Synchronize configurations and start HDP JournalNode and NameNode.

Install and start DataNode on the new machines.

Validate with upload/download and checksum tests.

HDFS upgrade overview
HDFS upgrade overview

3. YARN Upgrade

Stop Apache YARN and install HDP YARN.

Synchronize configurations and start HDP YARN, then decommission Apache YARN.

Enable YARN HA.

Validate with simple MapReduce, Hive, Tez, and Spark jobs.

Resolve client compatibility issues by adjusting mapreduce.framework.name and yarn.application.classpath parameters and exporting $YARN_HOME appropriately.

YARN upgrade steps
YARN upgrade steps

4. HBase Upgrade

Stop Apache HBase and install HDP HBase.

Synchronize configurations and start HDP HBase, then decommission Apache HBase.

Validate with basic get/put operations.

Minimize WAL generation during shutdown by disabling tables, flushing memstores, and using parallel disable operations.

HBase upgrade steps
HBase upgrade steps

Lessons Learned

Coordinate a dedicated operations war‑room and align all data‑pipeline owners on a low‑traffic maintenance window.

Maintain a detailed checklist reviewed by multiple team members to avoid on‑the‑fly errors.

Compress downtime by front‑loading preparatory work (new machines, OS unification, Ambari installation) and parallelizing script execution.

Conduct weekly full‑process rehearsals a month before the upgrade, including forced rollbacks.

Automation with Ansible

All repetitive tasks—stopping services, updating /etc/hosts, configuring BashRC, refreshing agents, and running tests—were codified into Ansible playbooks managed via Ansible‑AWX. A sample directory layout is shown below.

ambari_install/
├── group_vars
├── inventory
│   ├── ambari_agent
│   ├── ambari_bashrc
│   ├── ambari_hosts
│   ├── ambari_ln
│   └── ambari_slave
├── README.md
├── roles
│   ├── ambari_agent
│   │   └── tasks
│   │       └── main.yml
│   ├── ambari_bashrc
│   │   ├── files
│   │   │   └── bashrc_default
│   │   └── tasks
│   │       └── main.yml
│   ├── ambari_hosts
│   │   ├── files
│   │   │   ├── hosts_default
│   │   │   └── hosts_old
│   │   └── tasks
│   │       └── main.yml
│   └── ambari_ln
│       └── tasks
│           └── main.yml
└── roles.yml

These playbooks were integrated into Ansible‑AWX, enabling one‑click execution of the entire upgrade workflow.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cluster UpgradeHadoopAmbariAnsible
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.