Metadata Management and Governance Practices at Wing Payment: Architecture, Techniques, and Future Outlook
This article explains how metadata serves as the foundation of enterprise data governance, outlines common data governance challenges, describes Wing Payment's metadata governance framework and platform architecture, and presents future directions such as multi‑source management, cross‑cluster disaster recovery, and intelligent recommendation.
Metadata management is the cornerstone of enterprise data governance, enabling better data asset management, clarifying relationships, and supporting precise analysis and decision‑making.
The article identifies key data governance problems: low data quality and timeliness, difficulty identifying core data, post‑governance data chaos, security risks, and duplicated data development efforts.
Wing Payment's metadata governance system addresses these issues through four pillars: (1) Core data protection—prioritizing critical tasks and allocating resources; (2) Master data governance—defining, improving quality, integrating, and consuming master data; (3) Data standards—establishing data quality, security, and development norms with audit mechanisms; (4) Product architecture—comprising a data consumption layer, a governance layer, and a platform layer that hosts the metadata platform.
The metadata platform draws inspiration from Google’s Goods system and features a three‑layer architecture: a storage layer (HBase for basic metadata, Elasticsearch for indexes, graph DB for lineage), a service layer (providing metadata queries, lineage visualization, and integration services for ETL, BI, AI platforms), and an ingestion layer (adapters for various data sources). The metadata model includes basic, asset, security, and lineage metadata, with additional derived attributes. Collection involves plugins that push metadata to message queues, processed by services that write to HBase and update Elasticsearch, while full‑link field‑level lineage is captured via batch jobs and real‑time hooks.
Future plans focus on managing heterogeneous multi‑source data (including unstructured assets), implementing multi‑cluster disaster recovery, and adding intelligent recommendation capabilities to improve metadata discoverability.
The Q&A section clarifies distinctions between core and master data, explains that task priority mainly applies to offline jobs, notes that core task identification is manual, outlines data security governance components, describes metadata collection using both batch processes and Logstash for real‑time updates, and provides insights into HBase’s schema‑less storage model.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
