How We Upgraded a 100‑Node Hadoop Cluster with Ansible and Ambari
This article details the step‑by‑step process of modernizing a large‑scale Hadoop deployment—identifying legacy pain points, evaluating three migration strategies, selecting an in‑place upgrade using Ambari‑managed HDP, and automating the entire workflow with Ansible to minimize downtime and operational risk.
Background and Pain Points
The company initially built its Hadoop platform using the Apache community version, which grew to over a hundred nodes. Manual configuration changes, node additions/removals, and fragmented monitoring caused high operational overhead, inconsistent component configurations, lack of HA, mixed OS environments, and reliance on /etc/hosts instead of DNS.
Problem Analysis
Key issues included incomplete HA for HDFS, YARN, HiveServer2, mixed deployment of NameNode/JournalNode, unclear process‑to‑machine mapping, missing rack‑level awareness, unmanaged configuration files, and cumbersome restart procedures that required multiple manual SSH hops.
Solution Exploration
Three migration paths were considered:
Half‑automated management with Ansible only – improved efficiency but still error‑prone.
Transition to Cloudera Manager – rejected because the required Hadoop version downgrade conflicted with the existing 2.7.2 release.
Transition to Ambari‑managed HDP – chosen because HDP 2.6.5 closely matched the existing stack and allowed an in‑place upgrade.
Chosen Upgrade Strategy
The team opted for an in‑place upgrade of the Apache cluster to Ambari‑managed HDP, focusing first on HDFS migration, then sequentially upgrading Zookeeper, YARN, and HBase. Two upgrade models were evaluated (in‑place vs. data‑copy), and the in‑place approach was selected for lower risk and faster feedback.
Technical Upgrade Steps
1. Zookeeper Upgrade
Stop Apache ZK and install HDP ZK with a new data directory.
Copy existing ZK data to the HDP directory and synchronize configuration files.
Start HDP ZK and decommission Apache ZK.
Validate the upgrade and ensure rollback capability.
2. HDFS Upgrade
Record the current state of the three Ubuntu master nodes.
Stop Apache HDFS and back up metadata.
Prepare new CentOS machines, switch hostnames/IPs, and install HDP HDFS.
Copy Apache HDFS NameNode and JournalNode metadata to the HDP directories.
Synchronize configurations and start HDP JournalNode and NameNode.
Install and start DataNode on the new machines.
Validate with upload/download and checksum tests.
3. YARN Upgrade
Stop Apache YARN and install HDP YARN.
Synchronize configurations and start HDP YARN, then decommission Apache YARN.
Enable YARN HA.
Validate with simple MapReduce, Hive, Tez, and Spark jobs.
Resolve client compatibility issues by adjusting mapreduce.framework.name and yarn.application.classpath parameters and exporting $YARN_HOME appropriately.
4. HBase Upgrade
Stop Apache HBase and install HDP HBase.
Synchronize configurations and start HDP HBase, then decommission Apache HBase.
Validate with basic get/put operations.
Minimize WAL generation during shutdown by disabling tables, flushing memstores, and using parallel disable operations.
Lessons Learned
Coordinate a dedicated operations war‑room and align all data‑pipeline owners on a low‑traffic maintenance window.
Maintain a detailed checklist reviewed by multiple team members to avoid on‑the‑fly errors.
Compress downtime by front‑loading preparatory work (new machines, OS unification, Ambari installation) and parallelizing script execution.
Conduct weekly full‑process rehearsals a month before the upgrade, including forced rollbacks.
Automation with Ansible
All repetitive tasks—stopping services, updating /etc/hosts, configuring BashRC, refreshing agents, and running tests—were codified into Ansible playbooks managed via Ansible‑AWX. A sample directory layout is shown below.
ambari_install/
├── group_vars
├── inventory
│ ├── ambari_agent
│ ├── ambari_bashrc
│ ├── ambari_hosts
│ ├── ambari_ln
│ └── ambari_slave
├── README.md
├── roles
│ ├── ambari_agent
│ │ └── tasks
│ │ └── main.yml
│ ├── ambari_bashrc
│ │ ├── files
│ │ │ └── bashrc_default
│ │ └── tasks
│ │ └── main.yml
│ ├── ambari_hosts
│ │ ├── files
│ │ │ ├── hosts_default
│ │ │ └── hosts_old
│ │ └── tasks
│ │ └── main.yml
│ └── ambari_ln
│ └── tasks
│ └── main.yml
└── roles.ymlThese playbooks were integrated into Ansible‑AWX, enabling one‑click execution of the entire upgrade workflow.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
