Big Data 17 min read

How Hulu Upgraded Hadoop 2.6 to 3.0: Lessons in Compatibility and Migration

This article details Hulu's five‑year journey from Hadoop 2.6 to 3.3.2, covering major feature evolutions, the original cluster architecture, a comprehensive upgrade plan, compatibility challenges across HDFS, YARN, Hive, Spark and Flink, and the testing and rollout strategies that ensured a smooth migration.

Hulu Beijing

Jul 7, 2022

How Hulu Upgraded Hadoop 2.6 to 3.0: Lessons in Compatibility and Migration

Background

Hadoop 3 was released five years ago and has since evolved to version 3.3.2, introducing features such as mature HDFS erasure coding, simplified HDFS RBF client configuration, multi‑standby NameNodes, Docker support in YARN, dynamic resource‑allocation APIs, and improved federation.

Original Cluster Architecture

Before the upgrade, Hulu's Hadoop cluster ran on CDH5.7.3 with Hadoop 2.6.0, comprising thousands of servers, hundreds of petabytes of data, and core services HDFS, YARN, and Hive. Access was mediated by the Firework client, which encapsulated open‑source tools and provided dynamic configuration and version updates.

Upgrade Scope and Timeline

The upgrade covered most components—Cloudera, HDFS, YARN, Hive, HBase, Zookeeper, Sentry—moving from CDH5.7.3 to CDH6.3.3 (Hadoop 3.0.0). Testing began in Q2 2021 and production rollout occurred in July after four months of validation.

Compatibility Considerations

Four compatibility dimensions were examined:

Client‑service interface compatibility

Inter‑service component compatibility

Component‑storage state compatibility

User‑interface syntax and semantics compatibility

Key issues discovered included:

HDFS Block Access Token schema change (HDFS‑6708) requiring a patch (HDFS‑15191) on Hadoop 2.6.

Datanode directory hash restructuring (HDFS‑8791) necessitating pre‑upgrade block relocation.

Changes to HDFS chmod sticky‑bit handling (HDFS‑10689) and heap‑size environment variables.

YARN token identifier serialization shift to Protocol Buffers (YARN‑668) with a backward‑compatible patch (YARN‑8310) that still required a cache for original byte arrays.

Hive 2.1 metadata schema changes and numerous SQL syntax deprecations, prompting temporary keyword reverts.

Impact on Spark and Flink

Most production Spark jobs used Hive 1.x and Hadoop 2.x. Upgrading to Hadoop 3.x and Hive 2.1 introduced library conflicts; patches HIVE‑15016, HIVE‑16081, and HIVE‑16131 were applied to Hive 1.x, and JvmPauseMonitor interface changes were fixed. Flink worked with Hive 2.x and Hadoop 3.x without issues.

Classloader and SPI Challenges

During the upgrade, Spark and Flink classloader hierarchies were analyzed. The default

URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())

in Spark loaded FileSystem providers via the Service Provider Interface (SPI). Because Hadoop 2.6 and 3.0 provided different HttpFileSystem implementations, the merged SPI configuration caused HTTP URLs to be handled by the Hadoop 3 provider, breaking non‑HDFS HTTP accesses.

To mitigate this, a forward‑compatible Spark/Flink runtime containing full Hadoop 2 dependencies was deployed, ensuring that user jobs could run on Hadoop 3 clusters while preserving interface compatibility.

Upgrade Procedure

The upgrade proceeded in three phases:

Upgrade Cloudera, Sentry, Zookeeper (no downtime).

Stop Hive services, upgrade Hive metadata, then restart; YARN was also stopped and rebuilt.

Perform a rolling upgrade of HDFS (JournalNode → NameNode → Datanode), taking roughly two hours per namespace and three weeks for full Datanode rollout.

Outcome and Future Work

The migration was largely successful, providing deeper insight into the big‑data stack. Remaining gaps to the latest Hadoop 3.3 include performance tuning, container‑based isolation, and further dependency management. Future directions involve tighter cloud integration and continued Spark version upgrades.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

big data Flink cluster upgrade compatibility Spark Hadoop

Written by

Hulu Beijing

Follow Hulu's official WeChat account for the latest company updates and recruitment information.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.