Data Warehouse Architecture and Modeling with Alibaba MaxCompute and DataWorks
This tutorial explains how to select a technical architecture, design a three‑layer data warehouse (ODS, CDM, ADS), model tables and dimensions, choose storage strategies, handle slowly changing dimensions, synchronize data with DataWorks, and implement dimensional modeling and fact tables using Alibaba MaxCompute for big‑data analytics.
Technical Architecture Selection
Before designing a data model, choose the technical stack. This guide uses Alibaba Cloud MaxCompute as the offline compute engine together with DataWorks for data integration, quality, security, and management.
Data Warehouse Layering
Alibaba’s data architecture recommends three layers from bottom to top: ODS (Operation Data Store), CDM (Common Data Model), and ADS (Application Data Service).
Data Warehouse Layers
ODS (Operation Data Store) : stores raw, unprocessed data mirroring source schemas; primary responsibility is ingesting data into MaxCompute and tracking historical changes.
CDM (Common Data Model) : includes DIM, DWD, DWS layers derived from ODS; performs data cleaning, integration, and builds reusable analytical tables.
ADS (Application Data Service) : stores business‑specific statistical indicators derived from CDM.
The overall data flow moves from source systems through DataWorks integration into the ODS layer, then through transformation steps into CDM and finally into ADS for downstream consumption.
ODS Layer Table Design
The ODS layer contains six example tables (e.g., s_auction , s_sale , s_users_extra , s_biz_order_delta , s_logistics_order_delta , s_pay_order_delta ) that store raw transaction, user, and logistics data.
CREATE TABLE IF NOT EXISTS s_auction (
id STRING COMMENT '商品ID',
title STRING COMMENT '商品名',
gmt_modified STRING COMMENT '商品最后修改日期',
price DOUBLE COMMENT '商品成交价格,单位元',
starts STRING COMMENT '商品上架时间',
minimum_bid DOUBLE COMMENT '拍卖商品起拍价,单位元',
duration STRING COMMENT '有效期,销售周期,单位天',
incrementnum DOUBLE COMMENT '拍卖价格的增价幅度',
city STRING COMMENT '商品所在城市',
prov STRING COMMENT '商品所在省份',
ends STRING COMMENT '销售结束时间',
quantity BIGINT COMMENT '数量',
stuff_status BIGINT COMMENT '商品新旧程度 0 全新 1 闲置 2 二手',
auction_status BIGINT COMMENT '商品状态 0 正常 1 用户删除 2 下架 3 从未上架',
cate_id BIGINT COMMENT '商品类目ID',
cate_name STRING COMMENT '商品类目名称',
commodity_id BIGINT COMMENT '品类ID',
commodity_name STRING COMMENT '品类名称',
umid STRING COMMENT '买家umid'
) COMMENT '商品拍卖ODS' PARTITIONED BY (ds STRING COMMENT '格式:YYYYMMDD') LIFECYCLE 400;Similar CREATE statements are provided for the other ODS tables.
ODS Storage Strategies
Three storage modes are recommended:
Incremental storage : daily partitions holding only the day’s delta.
Full‑load storage : daily partitions containing a snapshot of all data up to that date.
Hybrid (slowly changing dimension) storage : adds start_dt and end_dt timestamps to record history.
Slowly Changing Dimensions
Use natural keys instead of surrogate keys to reduce ETL complexity and storage overhead. Implement snapshots (daily full copies) for dimensions that change infrequently, accepting the storage cost for faster processing.
Data Synchronization and Processing
Data is synchronized from source systems to MaxCompute via DataWorks. Full‑load data goes to the current day’s partition; real‑time incremental data should be handled by Data Transmission Service (DTS). Naming conventions and field‑level handling rules are described.
Public Dimension Layer (DIM)
DIM tables follow the naming pattern dim_{business}_pub_{dimension} . Example tables include dim_pub_area (region) and dim_asale_itm (product).
CREATE TABLE IF NOT EXISTS dim_asale_itm (
item_id BIGINT COMMENT '商品ID',
item_title STRING COMMENT '商品名称',
item_price DOUBLE COMMENT '商品成交价格_元',
item_stuff_status BIGINT COMMENT '商品新旧程度_0全新1闲置2二手',
cate_id BIGINT COMMENT '商品类目ID',
cate_name STRING COMMENT '商品类目名称',
commodity_id BIGINT COMMENT '品类ID',
commodity_name STRING COMMENT '品类名称',
umid STRING COMMENT '买家ID',
item_status BIGINT COMMENT '商品状态_0正常1用户删除2下架3未上架',
city STRING COMMENT '商品所在城市',
prov STRING COMMENT '商品所在省份'
) COMMENT '商品全量表' PARTITIONED BY (ds STRING COMMENT '日期,yyyymmdd');Detail Fact Layer (DWD)
DWD tables capture business‑process‑driven facts at the finest granularity. Example tables:
dwd_asale_trd_itm_di – transaction item facts.
ods_asale_trd_mbr_di – member facts.
dwd_asale_trd_ord_di – order facts.
CREATE TABLE IF NOT EXISTS dwd_asale_trd_itm_di (
item_id BIGINT COMMENT '商品ID',
item_title STRING COMMENT '商品名称',
item_price DOUBLE COMMENT '商品价格',
item_stuff_status BIGINT COMMENT '商品新旧程度_0全新1闲置2二手',
item_prov STRING COMMENT '商品省份',
item_city STRING COMMENT '商品城市',
cate_id BIGINT COMMENT '商品类目ID',
cate_name STRING COMMENT '商品类目名称',
commodity_id BIGINT COMMENT '品类ID',
commodity_name STRING COMMENT '品类名称',
buyer_id BIGINT COMMENT '买家ID'
) COMMENT '交易商品信息事实表' PARTITIONED BY (ds STRING COMMENT '日期') LIFECYCLE 400;Public Summary Fact Layer (DWS)
DWS tables aggregate metrics for analysis. Naming follows dws_{business}_{domain}_{granularity}_{period} . Example tables:
dws_asale_trd_byr_ord_1d – buyer‑level order summary for the latest day.
dws_asale_trd_itm_ord_1d – item‑level order summary for the latest day.
CREATE TABLE IF NOT EXISTS dws_asale_trd_byr_ord_1d (
buyer_id BIGINT COMMENT '买家ID',
buyer_nick STRING COMMENT '买家昵称',
mord_prov STRING COMMENT '收货人省份',
cate_id BIGINT COMMENT '商品类目ID',
cate_name STRING COMMENT '商品类目名称',
confirm_paid_amt_sum_1d DOUBLE COMMENT '最近一天订单已经确认收货的金额总和'
) COMMENT '买家粒度所有交易最近一天汇总事实表' PARTITIONED BY (ds STRING COMMENT '分区字段YYYYMMDD') LIFECYCLE 36000;Hierarchical Call Specification
Application (ADS) layers must call data through the CDM intermediate layer; direct ODS references are prohibited. CDM tasks should be limited to ten layers deep, each task produces a single output table, and virtual tasks can aggregate outputs from multiple upstream jobs.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.