Big Data 18 min read

How Pinterest Scaled a Hadoop Upgrade Across 17k Nodes

Pinterest’s Monarch batch‑processing platform, built on over 17 k YARN nodes in AWS, was upgraded from Hadoop 2.7.1 to 2.10.0 using a phased, cluster‑by‑cluster strategy that balanced minimal downtime, extensive validation, and custom patches to handle compatibility and dependency issues.

Past Memory Big Data

Aug 15, 2022

How Pinterest Scaled a Hadoop Upgrade Across 17k Nodes

Monarch is Pinterest’s batch‑processing platform consisting of more than 30 Hadoop YARN clusters and over 17 k EC2 nodes. In early 2021 the platform still ran Hadoop 2.7.1, and the growing number of internal patches made a version upgrade necessary. Pinterest chose Hadoop 2.10.0, the latest Hadoop 2 release at the time.

Challenges

Since the platform’s inception in 2016, workloads have grown and evolved, requiring hundreds of internal changes. Most critical batch jobs run on Monarch, so the upgrade had to avoid cluster downtime or SLA impact.

Upgrade Strategy

The upgrade was split into two phases. Phase 1 upgraded the platform itself from Hadoop 2.7 to Hadoop 2.10 while allowing user jobs to continue using Hadoop 2.7 dependencies. Phase 2 upgraded user‑defined applications to Hadoop 2.10. Both phases required a step‑by‑step rollout because of the platform’s size.

Upgrade each Monarch cluster one by one.

Batch‑upgrade user applications to bind to Hadoop 2.10 instead of 2.7.

Because there was no flexible build pipeline to produce two versions of a job with separate Hadoop dependencies, Pinterest had to run extensive pre‑upgrade validation to ensure that the majority of Hadoop 2.7‑built applications would still run on the 2.10 clusters.

Hadoop 2.10 Release Preparation

Pinterest’s Hadoop 2.7 branch contained many internal patches that needed to be ported to Apache Hadoop 2.10. Significant changes between the two versions made this non‑trivial. Examples of patches that were back‑ported include:

DirectOutputFileCommitter to write task output directly to S3, avoiding copy overhead.

Endpoints for ApplicationMaster and HistoryServer to expose per‑task counters.

Range‑lookup for container logs to fetch partial logs.

Node‑Id partitioning for log aggregation to avoid S3 request‑rate limits.

Fail‑over handling for new NameNode IP addresses.

Preempting‑reducers disablement when mapper‑to‑total‑mapper ratio exceeds a threshold.

Disk‑usage monitoring thread in the ApplicationMaster to terminate jobs that exceed limits.

Cluster Upgrade Methods Explored

Method 1: Cross‑Cluster Routing (CCR)

CCR, described in the “Efficient Resource Management” paper, creates a new Hadoop 2.10 cluster and gradually routes workloads from the old 2.7 cluster. If problems arise, workloads can be routed back, fixed, and re‑routed. Evaluation on small production and dev clusters showed two drawbacks: the cost of building a parallel cluster for each migration (expensive for clusters with thousands of nodes) and the time‑consuming bulk workload migration.

Method 2: Rolling Upgrade

Rolling upgrade of Worker nodes was considered, but it risked affecting all workloads on the cluster and made rollback expensive.

Method 3: In‑place Upgrade

Inspired by instance‑type canary upgrades, this approach inserts new instance‑type canary hosts into an auto‑scaling group, evaluates them against the base group, expands the canary group, and then shrinks the base group. The steps are:

Add Hadoop 2.10 Worker nodes (HDFS DataNode and YARN NodeManager) to the Hadoop 2.7 cluster.

Identify and fix any issues that arise.

Gradually increase the number of 2.10 workers while decreasing 2.7 workers until all workers are 2.10.

Upgrade all master services (NameNode, JournalNodes, ResourceManager, HistoryServer, etc.) in a similar replacement fashion.

Extensive testing on production Monarch clusters showed a seamless upgrade experience with only minor issues.

Final Upgrade Plan

Because most jobs were built with Hadoop 2.7 jars placed in the distributed cache, they could inadvertently load 2.7 jars on 2.10 nodes, causing dependency conflicts. After evaluating the three methods, Pinterest selected Method 3 (in‑place upgrade) for its speed and ability to resolve most dependency problems quickly. When a job could not be fixed promptly, CCR was used to route it back to a Hadoop 2.7 cluster for later remediation.

Issues Encountered and Fixes

Incompatible Behavior

NodeManager restart killed containers because the new default yarn.nodemanager.recovery.supervised was FALSE. Setting it to TRUE prevented container cleanup during NM restarts.

ApplicationPriority field missing in protobuf responses caused jobs to hang; the field is ignored when absent.

HADOOP‑13680: fs.s3a.readahead.range accepted “32M” format from Hadoop 2.8 onward, which Hadoop 2.7 could not parse. A compatibility fix was added to Hadoop 2.7.

Extra spaces introduced in io.serialization values in Hadoop 2.10 caused ClassNotFound errors; spaces were stripped.

Dependency Issues

Hadoop 2.7 jars in the distributed cache caused runtime conflicts; a fix prevented these jars from being added.

Incompatible woodstox-core-5.0.3.jar (Hadoop 2.10) versus wstx-asl-3.2.7.jar in user apps; the 5.0.3 jar was shaded.

Internal library S3DoubleWrite no longer needed and was removed.

Some Hadoop 2.7 libraries were bundled in user Bazel jars, leading to runtime conflicts; the solution was to decouple user jars from Hadoop jars.

Upgrading User Programs to Hadoop 2.10

To upgrade user applications, Pinterest ensured that Hadoop 2.7 jars were not shipped with user jars, allowing the cluster‑provided Hadoop jars to be used. The build environment was switched to Hadoop 2.10.

Decoupling User Applications from Hadoop Jars

Most data pipelines use Bazel‑built fat jars that contain Hadoop 2.7 client libraries. Because the classloader prefers classes from the fat jar, jobs running on Hadoop 2.10 clusters still used Hadoop 2.7 classes. The solution was to stop embedding Hadoop jars in the fat jar and rely on the Hadoop jars installed on the cluster.

Updating Bazel Targets

After decoupling, Hadoop Bazel targets were upgraded from 2.7 to 2.10. New dependency conflicts were identified through build tests and resolved by upgrading the conflicting libraries to compatible versions.

Conclusion

Upgrading more than 17 k nodes from one Hadoop version to another without causing major application breakage was a significant challenge. Pinterest achieved the upgrade with high quality, reasonable speed, and cost‑effectiveness, and the lessons shared are intended to benefit the broader community.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data cluster upgrade YARN Hadoop AWS EC2 Hadoop 2.10 Monarch

Written by

Past Memory Big Data

A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Challenges

Upgrade Strategy

Hadoop 2.10 Release Preparation

Cluster Upgrade Methods Explored

Method 1: Cross‑Cluster Routing (CCR)

Method 2: Rolling Upgrade

Method 3: In‑place Upgrade

Final Upgrade Plan

Issues Encountered and Fixes

Incompatible Behavior

Dependency Issues

Other Problems

Upgrading User Programs to Hadoop 2.10

Decoupling User Applications from Hadoop Jars

Updating Bazel Targets

Conclusion

Past Memory Big Data

How this landed with the community

Was this worth your time?

0 Comments

Hadoop 2.10 Release Preparation

Method 1: Cross‑Cluster Routing (CCR)

Method 2: Rolling Upgrade

Method 3: In‑place Upgrade

Upgrading User Programs to Hadoop 2.10