Big Data 32 min read

Data Platform Integration and Multi‑Data‑Center Architecture at Meituan‑Dianping

After Meituan merged with Dianping, engineers unified two massive Hadoop ecosystems across Beijing and Shanghai by breaking the project into four phases—unify, copy, switch, fuse—standardizing versions, implementing zone‑aware transfers, cross‑realm Kerberos, and federated metadata to achieve a single, reliable multi‑data‑center platform.

Meituan Technology Team

Aug 25, 2017

Data Platform Integration and Multi‑Data‑Center Architecture at Meituan‑Dianping

Background – In October 2015 Meituan merged with Dianping, creating the world’s largest lifestyle service platform. Two independent teams in Beijing and Shanghai operated separate technology stacks, data platforms, and clusters. The article describes the challenges and solutions for integrating these large‑scale data platforms.

Integration Goals – The aim was to achieve a single cluster, a unified data‑platform toolset, and a common development standard, while keeping the migration controllable.

Establishing Goals

The team set an overall objective of one cluster, one platform, and one set of standards. Because the goal was too large, it was broken down into manageable steps, starting with a unified client view.

Key Difficulties

Complex architecture, infrastructure limits (10 Gbps inter‑data‑center bandwidth), and high reliability requirements for daily data production and reporting.

Architecture Complexity & Reliability

Both sides ran Hadoop clusters with multi‑data‑center deployments. The original clusters used separate zones, and the merged platform needed to limit cross‑zone traffic. A zone concept was added to NameNode, YARN scheduler (Fair Scheduler), and a custom ZoneTransfer tool to control block‑level transfers.

Project Decomposition

The integration was split into four phases: Unify , Copy , Switch , and Fuse . Each phase is illustrated with diagrams in the original article.

Data Inter‑Visit

Early work focused on enabling analysts to access data from both clusters. Three main tasks were performed:

Collecting raw layer data from the original Meituan side.

Cluster‑to‑cluster data copy using Hadoop DistCp, with coordinated scheduling between the two clusters.

Implementing cross‑realm Kerberos authentication to allow seamless service access across the two security domains.

Kerberos cross‑realm required matching krbtgt principals, synchronized passwords, and careful configuration of krb5.conf on both client and server sides.

Cluster Fusion

After data copy, the Dianping cluster was merged into the Meituan data center. The process involved:

Standardizing Hadoop versions (upgrading the Shanghai side to 2.7.1 with multi‑zone support).

Copying HDFS blocks across data centers while respecting zone constraints.

Switching NameNode services and restarting YARN.

Introducing a federation layer so the former Dianping NameNode appears as a federated namespace.

Hive metadata was also merged by exporting MySQL‑backed metastore tables and establishing a continuous sync pipeline.

Development Tool Fusion

Internal toolchains were gradually unified, though the article does not detail specific tools.

Original Dianping Database Split & Solution

Approximately 7‑8 000 Hive tasks needed renaming due to differing database naming conventions. The solution was to add alias support in Hive’s metaserver, allowing both old and new table names to resolve to the same physical table, thus avoiding massive batch rewrites.

Summary & Outlook

The team plans a “normal‑state multi‑data‑center” solution that introduces a Zone Server to manage block‑level caching and placement. The approach aims to make cross‑data‑center operations routine and low‑cost.

Reflection – Technology to Operations

Key operational concepts introduced include gray‑scale changes, “closing the door” to prevent new non‑standard tasks, visibility of migration progress, clear ownership, and compatible fallback strategies.

Experience – Complex System Refactoring & Fusion

Effective project management required clear goal decomposition, dependency mapping, staged roll‑outs (often over weekends), monitoring, and rapid rollback capabilities.

Overall, the article provides a comprehensive case study of large‑scale data platform integration, multi‑data‑center Hadoop architecture, and the operational practices needed to sustain such a merger.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Platform Hadoop Kerberos Multi-Data Center Distcp Cluster Fusion

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.