Big Data 22 min read

Practical Experience of HDFS Federation at Meituan: Challenges, Improvements, and Automation

Meituan‑Dianping migrated its 2,000‑node HDFS cluster to Federation by fixing ViewFs compatibility, simplifying mount points, leveraging FastCopy for massive data moves, improving token handling, and automating split‑workflow steps, thereby overcoming single‑NameNode bottlenecks and providing a practical blueprint for large‑scale Hadoop deployments.

Meituan Technology Team

Apr 14, 2017

Practical Experience of HDFS Federation at Meituan: Challenges, Improvements, and Automation

In October 2015, after a series of optimizations, Meituan‑Dianping's HDFS cluster achieved significant stability and performance gains, but the single‑NameNode architecture began to show new bottlenecks: limited scalability, decreasing availability due to frequent CMS GC, growing RPC latency as the cluster approached 2,000 nodes, and lack of isolation when a single application overloaded the NameNode.

HDFS Federation, introduced in Hadoop‑0.23.0 as a horizontal scaling solution, creates multiple namespaces to improve scalability and isolation. Based on this background, Meituan launched an HDFS Federation migration project in October 2015.

The Federation approach is client‑centric and imposes many constraints on Hadoop clients and upstream applications. This article shares the concrete improvements made to the Federation design and the systematic migration process, aiming to provide a reference for other teams deploying Federation on large, production‑grade clusters.

Platform stack : Hadoop 2.4.1 with Kerberos authentication; Spark 1.1‑1.5 (Zeppelin as notebook); Hive 0.13 & 1.2, heavily using Presto and Kylin; DMLC support. The data‑platform team also built internal tools for warehouse management, raw‑data ingestion, non‑structured data development, ETL, scheduling, and unified query (Hive/Presto).

Limitations of vanilla Federation :

Requires the ViewFs scheme; all existing HDFS paths must be rewritten to use viewfs://, which is incompatible with many existing tools.

Changing fs.defaultFS breaks thousands of scripts because paths without an authority lose their nameservice.

Mount‑point semantics differ from Linux, and a single request cannot span multiple mount points, causing rename failures.

NameNode metadata and block files are not shared; splitting data requires full DistCp copies, incurring high storage cost.

The solution is heavily client‑dependent; any client that does not understand the new scheme will fail.

Key improvements :

Added compatibility for both ViewFs and HDFS schemes by modifying org.apache.hadoop.fs.FileSystem.fixRelativePart(Path) and org.apache.hadoop.fs.viewfs.ViewFileSystem.getUriPath(Path) to resolve ViewFs paths to real HDFS locations.

Introduced a new configuration fs.defaultNS so that paths like hdfs:///user can be routed to the correct namespace even when fs.defaultFS points to a ViewFs scheme.

Mount configuration refinements :

Reduced mount‑point complexity by isolating migration paths under dedicated directories (e.g., /user/hivedata/xx.db → /ns2/hivedata/xx.db).

Limited initial changes to Hive library paths to avoid recompiling MR/Spark jobs.

Analyzed NameNode audit logs to identify rename‑free Hive libraries and migrated only those.

To relax the restriction that renames cannot cross mount points, the following code was enabled:

// Note we compare the URIs. the URIs include the link targets.
// hence we allow renames across mount links as long as the mount links
// point to the same target.
if (!resSrc.targetFileSystem.getUri().equals(
          resDst.targetFileSystem.getUri())) {
  throw new IOException("Renames across Mount points not supported");
}
*/

//
// Alternate 3 : renames ONLY within the the same mount links.
//
if (resSrc.targetFileSystem !=resDst.targetFileSystem) {
  throw new IOException("Renames across Mount points not supported");
}

Storage cost and copy efficiency : With a 2,000‑node cluster and ~600 million metadata entries, a full copy would be prohibitive. Meituan evaluated Facebook’s FastCopy (an HDFS‑aware copy that leverages block locality and hard‑link creation). After fixing community issues (HDFS‑2139, DFS‑Used accounting) and adding a standalone FastCopy MR job, the team achieved a 14 + PB copy in 15 hours.

Security and token handling : ViewFs originally fetched tokens from every namespace serially, causing job submission failures when a NameNode was overloaded. Improvements include:

Container logs are written by NodeManager directly, avoiding the need for a token from the log namespace.

Added a parameter to ViewFs to skip token acquisition for unnecessary namespaces.

Made token acquisition failures non‑fatal; jobs only fail when actually accessing the unavailable namespace.

Automation of the split workflow :

Detect NameNode bottlenecks (metadata size, RPC QPS) and analyze user/group and Hive directories.

Identify rename relationships to avoid splitting inter‑dependent data.

Provision a new namespace if needed.

Disable the balancer for the source namespace.

Generate a balanced path list from the fsimage.

Perform an initial FastCopy round with idempotent checks.

Run incremental copy rounds (delete‑update) to sync changes.

Final cut‑over: lock source permissions, run a last copy, distribute new mount configs to clients and NodeManagers, update Hive meta, reopen permissions.

Monitor for a week, then delete the source namespace and restart the balancer.

Steps 1, 2, 5‑7 and parts of step 8 can be automated, which is a focus for future work.

Conclusion : HDFS Federation is a powerful but client‑centric solution for scaling NameNode services. By addressing scheme compatibility, mount‑point restrictions, storage‑copy efficiency, and token handling, Meituan‑Dianping successfully migrated a production‑grade cluster with minimal disruption, providing valuable lessons for other large‑scale Hadoop deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data HDFS Hadoop Meituan FastCopy Federation

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.