Big Data 19 min read

Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3

This article details the end‑to‑end upgrade of a 5000‑node Hadoop 2.6.0 cluster to Hadoop 3.2.1 at 58.com, covering HDFS migration, RBF and EC adoption, Yarn federation and rolling upgrades, MR3 integration, extensive compatibility testing, and operational lessons learned for large‑scale big‑data platforms.

58 Tech

May 28, 2021

Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3

Upgrade Background – Hadoop 3.x introduced many new features, especially erasure coding (EC) in HDFS and federation in YARN, which can significantly reduce storage costs and improve resource utilization. 58.com’s data platform, the core of its offline data storage and computation, needed to migrate from Hadoop 2.6.0 to 3.2.1.

HDFS Upgrade – The team performed a direct upgrade of the entire offline cluster (over 5000 DataNodes) from 2.6.0‑cdh5.4.4 to 3.2.1 without an intermediate test cluster. Five key compatibility areas were addressed: NN metadata compatibility (rolling back HDFS‑14831), DN layout version changes (256×256 → 32×32 directories), client‑side compatibility, backend service interface compatibility, and simplifying the rolling‑upgrade workflow.

Code Integration – Custom HDFS features (security, NN performance, DN I/O optimizations) developed on 2.6.0 were merged into the 3.2.1 codebase.

Testing – Comprehensive tests were executed, including client‑interface compatibility, rolling‑upgrade scenarios, and NN performance benchmarks, confirming that the upgraded NN met production latency requirements.

Deployment – The upgrade was split into two phases: (1) NameNode/JournalNode/ZKFC upgrade (≈1 week) and (2) DataNode upgrade (≈1 month for 5000+ nodes). The streamlined process required minimal operator intervention.

RBF Migration – ViewFS routing was replaced by Router‑Based Federation (RBF) to achieve centralized mount‑table management, real‑time updates, multi‑cluster mounts, and global quota support. Custom classes ViewFsRedirectDistributedFileSystem.java and ViewFsRedirectHdfs.java were introduced to make the switch transparent to users.

EC Implementation – The team adopted the bar‑layout EC scheme (phase 1) for cold data, performing offline conversion via a modified distcp and validating integrity with the COMPOSITE_CRC algorithm (HDFS‑13056). An automatic conversion platform was built to identify and convert cold data, achieving a 50 % storage reduction (50 PB → 25 PB).

YARN Upgrade – A rolling upgrade from 2.6.0 to 3.2.1 was planned with three principles: user transparency, compatibility with all compute frameworks, and support for roll‑back. Compatibility work covered RM state data (ZK parameters), NM LevelDB state, client‑side APIs, backend service interfaces, and MR task communication (TaskUmbilicalProtocol). Configuration tweaks (e.g., yarn.nodemanager.recovery.enabled=true) mitigated known issues.

MR3 Upgrade – To leverage MR3’s NativeTask and FileOutputCommitter improvements, the team added a custom HivePlatform class for Hive serialization support and used distributed cache to deliver native libraries to NM nodes. Performance tests showed a >15 % speedup for Hive SQL workloads.

Future Outlook – After successfully deploying HDFS RBF and EC, the next steps include completing YARN federation, exploring cross‑region deployments, and activating additional Hadoop 3.x features to further improve efficiency and reduce costs for the company’s data‑intensive services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data cluster upgrade YARN erasure-coding HDFS Hadoop Router-based Federation

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.