Data Governance Practices and Model Design in Ctrip Vacation Data Warehouse
This article shares the practical experience and thinking behind Ctrip's vacation data governance project, covering team efficiency optimization, demand sorting, data domain definition, warehouse layering, unified dimension modeling, metric standardization, and the overall benefits of a centralized data governance framework.
Leon Gu, a data warehouse expert at Ctrip, introduces the challenges faced by the vacation business lines—resource waste due to duplicated end‑to‑end development, unclear requirements, and inconsistent dimensions and metrics.
The article then describes how the data team reorganized from many small data warehouses to a hybrid structure that retains vertical teams while adding a centralized "public" group to handle common data domains, data assets, and data operations.
By consolidating repeatable data domains (e.g., service and traffic) into the public group, the team eliminates redundant pipelines, creates unified assets (e.g., user and supplier domains), and standardizes data synchronization and maintenance.
Demand sorting is performed by inventorying all tasks, reports, and applications, assigning owners, and using metadata and lineage to reduce manual effort. The process classifies items as "can be offline", "can be merged", or "retain" based on usage frequency, ownership, and downstream impact.
Data domain definition follows a fourteen‑domain taxonomy (date, geography, user, transaction, resource, product, market, organization, service, finance, log, metadata, system device, personnel) to provide a clear abstraction layer for business analysis.
The data warehouse is layered into ODS, EDW, CDM, and ADM, each with specific responsibilities: ODS stores raw production data, EDW holds cleaned factual data, CDM contains domain‑level models and derived metrics, and ADM serves application‑specific models.
Unified dimensions are introduced to resolve inconsistencies (e.g., differing destination dimensions across product lines) and to enable cross‑domain integration.
A bus matrix maps business processes to dimensions, clarifying which processes rely on which data attributes.
Metric standardization involves cataloguing atomic and derived metrics, defining their calculation logic, and enforcing that metric processing occurs at CDM or higher layers to ensure reuse and consistency.
The conclusion reiterates how the centralized governance model reduces duplicated effort, clarifies requirements, and lowers the cost of data understanding by standardizing dimensions and metrics.
Report
Path
Offline?
Dimensions
Metrics
Flow Report
/reportPathA
No
Date, Page, …
PV, UV, …
Conversion Report
/reportPathB
Needs Refactor
Date, Product, …
UV, Conversion Rate, …
Report
Path
Offline?
Dimensions
Metrics
Product Report
/reportPathC
Yes
Date, Product, …
Metric1, Metric2, …
Layer
Definition
ODS
Raw data synced from production, kept for troubleshooting.
EDW
Stores detailed factual data with basic cleaning; no derived metrics.
CDM
Domain‑level models, aggregated data, derived metrics; no cross‑domain facts.
ADM
Application‑level models; can contain cross‑domain facts.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
