Origin Data Governance Platform: Architecture, Modules, and Implementation at Meituan
The article describes Meituan's Origin Data Governance Platform, detailing its background, challenges, architectural redesign, core modules such as data storage, metadata, business, security, and application management, as well as its internal workflow, achievements, and future roadmap for unified, secure, and high‑performance data services.
Background
Meituan, a highly digitalized and technology‑driven company, places great importance on extracting value from data. Over the years, its hotel‑travel division built a comprehensive solution consisting of a data warehouse and various data platforms (self‑service reporting, professional analytics, CRM, performance assessment, etc.) to eliminate data silos and support diverse analytical needs.
While the early architecture (Figure 1) efficiently met business demands, long‑term use revealed inconsistencies in metric definitions, calculation logic, and data sources, leading to low trust in indicator data and hampering decision‑making.
Challenges
The data‑governance project faced four main challenges: (1) determining where the governance platform should be inserted in the architecture with minimal intrusion, (2) designing a concise, efficient management process for unified metric and dimension information, (3) integrating various storage engines to provide a high‑concurrency, high‑availability data outlet, and (4) ensuring data security across business lines.
Solution Approach
The platform was positioned between the data‑warehouse (or data‑mart) layer and the data‑application layer, acting as a bridge that enforces rules, makes interactions queryable and monitorable, and transforms chaotic exchanges into orderly processes (Figure 2).
Platform Architecture
The platform consists of several functional modules—data storage, query, cache, metadata management, business management, security management, application management, and external APIs—organized to reduce development difficulty and improve maintainability (Figure 3).
Data Storage
The platform manages data in the Topic layer of the warehouse and the application layer, supporting Hive, MySQL, Kylin, Palo, Elasticsearch, and Druid (Figure 4). Storage decisions are made by data engineers based on space, query performance, and model organization, while the platform oversees metadata, monitoring, and alerts.
Metadata Management
Metadata is split into business metadata (metric and dimension definitions) and data metadata (table, model, and field bindings). Four sub‑modules—table management, model management, metric management, and dimension management—handle creation, maintenance, and governance of these assets.
Table Management
Manages database connections, table schemas, types (fact or dimension), usage, ETL links, owners, recommendation scores, monitoring configurations, and sample data.
Model Management
Captures table relationships (join types), ER diagrams, field‑to‑dimension bindings, and metric‑to‑model bindings, supporting star/snowflake schemas and OLAP (MOLAP/ROLAP) models (Figures 5‑6).
Dimension Management
Separates business information (name, definition, classification) from technical details (whether a dimension table exists, date dimension flag, code/name mappings). Supports both enumerated dimensions and dimension tables.
Metric Management
Collects business attributes (name, classification, frequency, precision, unit, definition, calculation logic, analysis method) and technical attributes (data type, code, model bindings, virtual model creation, monitoring thresholds). Also tracks related metrics and applications for impact analysis (Figure 7).
Business Management
Divided into business‑line management, theme management, and ticket (work‑order) management, ensuring proper permissions, resource isolation, and traceability of data‑processing requests.
Business‑Line & Theme Management
Controls visibility of metrics, dimensions, tables, and models per business line, with role‑based access (regular user vs. admin) and multi‑level review for new metrics.
Ticket Management
Standard workflow for requesting, reviewing, developing, and approving metric‑dimension and model changes, with automatic logging for auditability (Figure 8).
Security Management
Provides platform‑operation permission control integrated with the corporate “General Order” system and API‑call permission management, covering page access, business‑line/data‑line user rights, and application‑level rights, along with approval and audit modules.
Application Management
Consists of data applications, external applications, and a data map, recording relationships among metrics, dimensions, models, tables, and external services, and enabling query services, ETL production, and API exposure (Figure 9).
External APIs
Expose metadata (metric, dimension, table, model information), data (query services with aggregation, comparative analysis, cross‑engine support), and monitoring/statistics to downstream systems, ensuring consistency and reliability.
Internal Working Principle
The platform maps business metric/dimension information to data‑model calculations, dynamically generates optimal SQL or query statements, and executes them via a distributed query engine built on Akka Cluster, Redis‑backed task queues, load‑balanced workers, and automatic degradation/monitoring (Figure 11).
Management Process
Roles include business owners and data engineers (RD). Business owners maintain metric business information, engineers create tables, models, and bind metrics, then build data applications for end‑users (Figure 12).
Results
The platform has been deployed to support more than ten data platforms within the hotel‑travel division, achieving unified metric and dimension definitions, a single data export point, unified monitoring and alerting, flexible query capabilities, data‑lineage visualization, and provenance analysis.
Future Outlook
As part of the broader “Tian‑Gong” ecosystem (including a universal reporting system and a data‑query system), the platform aims to provide plug‑and‑play standards for metadata, query, and visualization components, enabling modular expansion and faster service development (Figure 13).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
