Operations 26 min read

Data Model Governance Practices at Taobao (Tao Tian Group)

This article presents a comprehensive overview of Taobao's data model governance, covering background challenges, a four‑pillar solution framework, detailed practices such as invalid table decommissioning, source‑table consolidation, data handover, public‑layer operations, incremental control, productization, and future planning to improve efficiency, cost, and quality of large‑scale data models.

DataFunSummit
DataFunSummit
DataFunSummit
Data Model Governance Practices at Taobao (Tao Tian Group)

Background and Issues Taobao (now Tao Tian Group) faced rapid data growth, increasing costs, stability problems, and declining consumption efficiency, prompting the need for model governance. Key challenges included exploding data scale, many invalid tables, high unassigned table ratios, and non‑standard practices across roles.

Four "1" Pillars The governance approach is built on: a unified specification system, an evaluation system, incremental control, and stock‑level governance capabilities.

Solution Overview The solution addresses root causes through better standards, evaluation metrics, automated controls, and product tools, aiming to improve data supply, consumption efficiency, and overall model quality.

Key Practices

Invalid table decommissioning and ODS source‑import control with high‑accuracy identification and automated safe removal.

Data handover automation that triggers evaluation and governance steps during role changes.

Public‑layer specialized operations to increase reuse rates and coverage, supported by data albums and targeted governance.

Incremental control by distributing governance rules across pipelines via the Data Governance Center.

Productization: intelligent modeling for design and code generation, data maps for discovery, and a governance center for automated actions.

Capability Consolidation The four "1" capabilities have been codified into a specification system, an enhanced evaluation system (including metrics, diagnostics, and governance outcomes), and supporting tools.

Future Planning Plans include further supply‑consumption efficiency improvements, stricter architectural controls, tighter integration of evaluation and governance, and leveraging emerging technologies such as large language models for metadata enrichment, automated code generation, and one‑click migration.

Q&A Highlights Answers address consumption relationships, elegant decommissioning, source‑import identification, balancing governance with agile data usage, visual modeling beyond ER diagrams, and the technical challenges of one‑click migration.

big dataAutomationmetadatadata platformData Governancemodel governance
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.