Big Data 13 min read

Taobao Data Model Governance: Challenges, Analysis, and Solutions

This article presents a comprehensive overview of Taobao's data model governance, detailing the background and problems of the current data architecture, analyzing root causes, proposing a structured governance framework with DataWorks automation, and outlining future plans to improve efficiency, standardization, and product tooling.

DataFunSummit
DataFunSummit
DataFunSummit
Taobao Data Model Governance: Challenges, Analysis, and Solutions

Overview

The session, hosted by DataFunTalk, features Guo Jinshi from Alibaba discussing the past year of data model governance within the Taobao ecosystem, summarizing key findings and future directions.

1. Model Background & Issues

Taobao's data middle platform has operated for about seven years without systematic governance. Data generation is 22% manual and 78% machine‑created, with 9% active data and 21% non‑standard data. The data lifecycle shows a 25‑month model lifespan, 30% annual growth, and 44% retention, while model reuse and cross‑market dependencies are problematic.

Low reuse of public‑layer tables

Uneven distribution of public tables across teams

Excessive temporary tables, inconsistent naming, and duplicated ADS tables

Cross‑market dependencies affecting stability

2. Problem Analysis

Seven major issues were identified: temporary tables, naming inconsistencies, over‑designed public layer, duplicated ADS construction, cross‑market dependencies, unsunk common logic, and ADS‑ODS coupling. Root causes fall into four categories: architecture standards, process mechanisms, product tools, and development capability.

3. Governance Solutions (DataWorks Intelligent Data Modeling)

The proposed solution includes a four‑step approach: inventory of existing assets, standardization of incremental development, ongoing health checks, and data‑driven governance. Specific mechanisms involve layered architecture standards, market segmentation principles, and a co‑construction model for the public layer.

Define clear architecture layers (ODS, CDM, ADS)

Segment data markets by business scenario (MECE principle)

Open public‑layer co‑construction with post‑audit governance

Automate model migration and code generation via DataWorks

Integrate data maps for easier data discovery

4. Model Governance Process

Introduce quantitative scoring for models at team, domain, and core levels, generating tags for issues. The workflow combines data‑driven evaluation (model scores) with product‑driven actions (expert judgments) to prioritize remediation.

5. Future Planning

Improve application‑layer efficiency and reduce coupling

Refine architectural standards and control mechanisms

Enhance product tools: intelligent modeling, data testing, operation upgrades, real‑time governance assistants, batch deletion, and data maps

6. Q&A Highlights

Key points from the Q&A include a hybrid top‑down/bottom‑up approach to public‑layer construction, the need for unified standards across business units, criteria for sinking metrics to the public layer, and handling naming and cross‑market dependency challenges.

Overall, the presentation outlines a pragmatic, tool‑enabled roadmap to elevate data model governance, improve reuse, and sustain long‑term value for Taobao's massive data ecosystem.

Alibababig datadata modelingdata governanceDataWorks
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.