Big Data 19 min read

Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3

This article details the end‑to‑end upgrade of a 5000‑node Hadoop 2.6.0 cluster to Hadoop 3.2.1 at 58.com, covering HDFS migration, RBF and EC adoption, Yarn federation and rolling upgrades, MR3 integration, extensive compatibility testing, and operational lessons learned for large‑scale big‑data platforms.

58 Tech
58 Tech
58 Tech
Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3

Upgrade Background – Hadoop 3.x introduced many new features, especially erasure coding (EC) in HDFS and federation in YARN, which can significantly reduce storage costs and improve resource utilization. 58.com’s data platform, the core of its offline data storage and computation, needed to migrate from Hadoop 2.6.0 to 3.2.1.

HDFS Upgrade – The team performed a direct upgrade of the entire offline cluster (over 5000 DataNodes) from 2.6.0‑cdh5.4.4 to 3.2.1 without an intermediate test cluster. Five key compatibility areas were addressed: NN metadata compatibility (rolling back HDFS‑14831), DN layout version changes (256×256 → 32×32 directories), client‑side compatibility, backend service interface compatibility, and simplifying the rolling‑upgrade workflow.

Code Integration – Custom HDFS features (security, NN performance, DN I/O optimizations) developed on 2.6.0 were merged into the 3.2.1 codebase.

Testing – Comprehensive tests were executed, including client‑interface compatibility, rolling‑upgrade scenarios, and NN performance benchmarks, confirming that the upgraded NN met production latency requirements.

Deployment – The upgrade was split into two phases: (1) NameNode/JournalNode/ZKFC upgrade (≈1 week) and (2) DataNode upgrade (≈1 month for 5000+ nodes). The streamlined process required minimal operator intervention.

RBF Migration – ViewFS routing was replaced by Router‑Based Federation (RBF) to achieve centralized mount‑table management, real‑time updates, multi‑cluster mounts, and global quota support. Custom classes ViewFsRedirectDistributedFileSystem.java and ViewFsRedirectHdfs.java were introduced to make the switch transparent to users.

EC Implementation – The team adopted the bar‑layout EC scheme (phase 1) for cold data, performing offline conversion via a modified distcp and validating integrity with the COMPOSITE_CRC algorithm (HDFS‑13056). An automatic conversion platform was built to identify and convert cold data, achieving a 50 % storage reduction (50 PB → 25 PB).

YARN Upgrade – A rolling upgrade from 2.6.0 to 3.2.1 was planned with three principles: user transparency, compatibility with all compute frameworks, and support for roll‑back. Compatibility work covered RM state data (ZK parameters), NM LevelDB state, client‑side APIs, backend service interfaces, and MR task communication (TaskUmbilicalProtocol). Configuration tweaks (e.g., yarn.nodemanager.recovery.enabled=true ) mitigated known issues.

MR3 Upgrade – To leverage MR3’s NativeTask and FileOutputCommitter improvements, the team added a custom HivePlatform class for Hive serialization support and used distributed cache to deliver native libraries to NM nodes. Performance tests showed a >15 % speedup for Hive SQL workloads.

Future Outlook – After successfully deploying HDFS RBF and EC, the next steps include completing YARN federation, exploring cross‑region deployments, and activating additional Hadoop 3.x features to further improve efficiency and reduce costs for the company’s data‑intensive services.

Big DataCluster UpgradeYARNErasure CodingHDFSHadoopRouter Based Federation
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.