Tencent Data Governance Practices and the WeData Platform
This article outlines Tencent's data governance challenges, internal practices across three maturity stages, and introduces the WeData platform that provides comprehensive capabilities for data assetization, cost control, quality assurance, security, and metadata management to support large‑scale big‑data operations.
Introduction – The talk presents the current stage of Tencent's data governance, practical experience, and the WeData data‑governance platform.
Data Governance Challenges
Management challenges: data scattered across many business units and reporting locations, making unified management difficult.
Technical challenges: ensuring data quality after collection.
Business challenges: missing metadata prevents unified auditing and measurement.
Data Governance "Maslow" Hierarchy – Issues are classified by timeliness, quality, usability, security, and cost.
Internal Practices – Business Landscape – Tencent operates dozens of business groups (BGs) with thousands of product lines, EB‑scale data storage, and thousands of analysts.
Three Stages of Data Governance
Stage 1: Data assetization – turning data into valuable assets.
Stage 2: Cost reduction and efficiency – lowering resource consumption while maintaining asset value.
Stage 3: Platformization – abstracting common practices into a data‑governance platform.
Practice 1 – News Data Assetization – Faced with inconsistent data standards and low data reuse, Tencent News unified data models, upgraded warehouses, and built 250 models, 52 dimension tables, and 270 application tables, achieving >95% completeness, 73% reuse, and <5% cross‑layer references.
Practice 2 – PCG Data Cost Governance – Defined cost scope (collection, generation, analysis, application platforms) and reduced both resource usage and unit cost, cutting absolute big‑data cost by at least 10% despite a 30%+ month‑over‑month cost increase.
Practice 3 – Platformization of Governance – Four phases: overview, asset details, governance solutions, and execution; a scoring system evaluates conformity, security, quality, cost, and usage.
WeData Platform Architecture – The platform consists of two parts: agile data production (modeling, integration, development, services) and efficient data governance (asset, quality, security, metadata).
WeData Services – Data flow/storage layer and processing layer (aggregation, development, operations, API services).
Governance Tools
Standardization tools: metric, dimension, and metadata management; warehouse planning and physical materialization.
Quality tools: rule definition, real‑time ETL monitoring, offline checks, alerting, and periodic quality reports.
Security tools: sensitive data identification, privacy protection (static/dynamic masking, encryption, watermarking), and audit of data access/export.
Metadata asset tools: discovery, technical and business metadata linkage, data catalog, lineage, change tracking, and data temperature.
Metadata Management – Improves search, understanding, and application of data by providing global discovery, usage metrics, and clear business context.
Governance Outcomes – Achieved cost reduction, increased data reuse, and established an enterprise‑wide rating system to prune cold or duplicate data.
Q&A
Metadata security tags are independent and not propagated downstream.
Business metadata includes normative, quality, security, cost, and usage dimensions.
TBDS is the underlying data‑processing engine; WeData is the governance layer built on top of it.
Thank you for attending; follow DataFunTalk for more content.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
