DataMan: A Data Quality Governance Platform for Meituan's Big Data Ecosystem
Meituan’s DataMan platform provides a unified, closed‑loop data‑quality governance solution that collects demand, refines rules, executes monitoring across offline and real‑time jobs, tracks issues, and builds a knowledge base, improving completeness, accuracy, consistency, and timeliness while optimizing storage, reducing fault resolution time, and supporting data‑driven decisions.
Background: Data has become a critical asset for internet companies; data quality directly impacts business decisions. Meituan's DataMan platform was built to manage data quality across its large‑scale big data environment.
Challenges: Dispersed monitoring of offline and real‑time jobs, lack of unified quality metrics, unclosed fault‑handling processes, missing model quality monitoring, rapid growth of storage resources, and insufficient resource‑level monitoring.
Solution Overview: DataMan implements a closed‑loop PDCA workflow—demand discovery, rule extraction, rule‑engine configuration, execution, issue detection, analysis, remediation, and knowledge‑base construction.
Key Processes: (1) Quality demand collection, (2) Rule refinement, (3) Rule repository building, (4) Execution, (5) Issue detection, (6) Reporting, (7) Remediation, (8) Knowledge‑base formation.
Quality Metrics: Completeness, Accuracy, Reasonableness, Consistency, Timeliness, with specific standards tailored for big‑data characteristics.
Technical Architecture: Four‑layer design – Data source & market layer, Storage model layer, System function layer, Presentation layer. The platform integrates Hive, Spark, Storm, Kafka, MySQL, etc., and uses Spring Boot, Hibernate, Zebra middleware, and a Bootstrap‑based front‑end.
Key Modules: Monitoring object management, metric management, process monitoring (offline & real‑time), issue tracking, recommendation engine, knowledge‑base, and system administration.
Workflow Management: Structured process from issue reporting, analysis, fault classification (S1‑S4), to knowledge‑base consolidation, supported by role‑based permissions.
Results: Improved data asset quality, optimized storage and job performance, reduced fault resolution time, and established a scalable knowledge repository.
Conclusion: A comprehensive data‑quality platform enhances governance, supports data‑driven decision making, and strengthens competitive advantage.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
