Big Data 17 min read

Data Governance Practices and Model Design in Ctrip Vacation Data Warehouse

This article shares the practical experience and thinking behind Ctrip's vacation data governance project, covering team efficiency optimization, demand sorting, data domain definition, warehouse layering, unified dimension modeling, metric standardization, and the overall benefits of a centralized data governance framework.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Data Governance Practices and Model Design in Ctrip Vacation Data Warehouse

Leon Gu, a data warehouse expert at Ctrip, introduces the challenges faced by the vacation business lines—resource waste due to duplicated end‑to‑end development, unclear requirements, and inconsistent dimensions and metrics.

The article then describes how the data team reorganized from many small data warehouses to a hybrid structure that retains vertical teams while adding a centralized "public" group to handle common data domains, data assets, and data operations.

By consolidating repeatable data domains (e.g., service and traffic) into the public group, the team eliminates redundant pipelines, creates unified assets (e.g., user and supplier domains), and standardizes data synchronization and maintenance.

Demand sorting is performed by inventorying all tasks, reports, and applications, assigning owners, and using metadata and lineage to reduce manual effort. The process classifies items as "can be offline", "can be merged", or "retain" based on usage frequency, ownership, and downstream impact.

Data domain definition follows a fourteen‑domain taxonomy (date, geography, user, transaction, resource, product, market, organization, service, finance, log, metadata, system device, personnel) to provide a clear abstraction layer for business analysis.

The data warehouse is layered into ODS, EDW, CDM, and ADM, each with specific responsibilities: ODS stores raw production data, EDW holds cleaned factual data, CDM contains domain‑level models and derived metrics, and ADM serves application‑specific models.

Unified dimensions are introduced to resolve inconsistencies (e.g., differing destination dimensions across product lines) and to enable cross‑domain integration.

A bus matrix maps business processes to dimensions, clarifying which processes rely on which data attributes.

Metric standardization involves cataloguing atomic and derived metrics, defining their calculation logic, and enforcing that metric processing occurs at CDM or higher layers to ensure reuse and consistency.

The conclusion reiterates how the centralized governance model reduces duplicated effort, clarifies requirements, and lowers the cost of data understanding by standardizing dimensions and metrics.

Report

Path

Offline?

Dimensions

Metrics

Flow Report

/reportPathA

No

Date, Page, …

PV, UV, …

Conversion Report

/reportPathB

Needs Refactor

Date, Product, …

UV, Conversion Rate, …

Report

Path

Offline?

Dimensions

Metrics

Product Report

/reportPathC

Yes

Date, Product, …

Metric1, Metric2, …

Layer

Definition

ODS

Raw data synced from production, kept for troubleshooting.

EDW

Stores detailed factual data with basic cleaning; no derived metrics.

CDM

Domain‑level models, aggregated data, derived metrics; no cross‑domain facts.

ADM

Application‑level models; can contain cross‑domain facts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datadata modelingData WarehouseData GovernanceCtrip
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.