Why Wide Tables Are Essential in DWS Layer: 10 Real-World Modeling Scenarios
This article explains the purpose of the DWS (Data Warehouse Service) layer, why wide‑table modeling is crucial for performance and service‑oriented interfaces, and provides ten practical wide‑table designs with core field definitions, CREATE TABLE statements, and sample INSERT queries for common business domains such as products, users, orders, regions, channels, suppliers, services, finance, logistics, and data quality monitoring.
Why DWS Layer Needs Wide‑Table Modeling
In a data warehouse, the DWS layer (Data Warehouse Service) connects detailed DWD tables to upper‑level applications (ADS). Its main goals are lightweight aggregation, service‑oriented interfaces for BI tools and dashboards, and performance optimization through pre‑computation and indexing. Wide‑table modeling merges multiple dimension and fact fields into a single table to provide a plug‑and‑play data service.
10 Classic Wide‑Table Modeling Scenarios
1. Product Theme Wide Table: dws_goods_detail
Scenario : e‑commerce analysis of sales trends, inventory turnover, and repurchase rates.
goods_id (VARCHAR(32)) – 商品唯一ID – example "G1001"
category_id (VARCHAR(16)) – 商品类目ID – example "C5001"
category_name (VARCHAR(64)) – 商品类目名称 – example "手机数码"
brand_id (VARCHAR(16)) – 品牌ID – example "B2001"
brand_name (VARCHAR(32)) – 品牌名称 – example "华为"
price (DECIMAL(10,2)) – 商品价格 – example 2999.00
sale_count (INT) – 销售数量(日) – example 150
sale_amount (DECIMAL(12,2)) – 销售金额(日) – example 449850.00
return_rate (DECIMAL(5,2)) – 退货率(%) – example 1.50
inventory (INT) – 当前库存量 – example 2000
inventory_turnover_days (DECIMAL(5,2)) – 库存周转天数 – example 15.20
repurchase_rate (DECIMAL(5,2)) – 复购率(%) – example 25.30
update_time (DATETIME) – 数据更新时间 – example "2023-09-07 15:30:00"
CREATE TABLE dws_goods_detail (
goods_id VARCHAR(32) COMMENT '商品唯一ID',
category_id VARCHAR(16) COMMENT '商品类目ID',
category_name VARCHAR(64) COMMENT '商品类目名称',
brand_id VARCHAR(16) COMMENT '品牌ID',
brand_name VARCHAR(32) COMMENT '品牌名称',
price DECIMAL(10,2) COMMENT '商品价格',
sale_count INT COMMENT '销售数量(日)',
sale_amount DECIMAL(12,2) COMMENT '销售金额(日)',
return_rate DECIMAL(5,2) COMMENT '退货率(%)',
inventory INT COMMENT '当前库存量',
inventory_turnover_days DECIMAL(5,2) COMMENT '库存周转天数',
repurchase_rate DECIMAL(5,2) COMMENT '复购率(%)',
update_time DATETIME COMMENT '数据更新时间',
PRIMARY KEY (goods_id, update_time)
) COMMENT '商品主题宽表'
DISTRIBUTED BY HASH(goods_id) BUCKETS 32
PROPERTIES ("replication_num" = "1"); INSERT INTO dws_goods_detail
SELECT
g.goods_id,
g.category_id,
c.category_name,
g.brand_id,
b.brand_name,
g.price,
SUM(o.quantity) AS sale_count,
SUM(o.quantity * o.price) AS sale_amount,
ROUND(SUM(o.return_quantity) * 100.0 / NULLIF(SUM(o.quantity),0), 2) AS return_rate,
i.inventory,
ROUND(30.0 * SUM(o.quantity) / NULLIF(SUM(o.inventory),0), 2) AS inventory_turnover_days,
ROUND(COUNT(DISTINCT o.user_id) * 100.0 / NULLIF(COUNT(DISTINCT o.user_id),0), 2) AS repurchase_rate,
NOW() AS update_time
FROM dwd_order_detail o
JOIN dwd_goods_info g ON o.goods_id = g.goods_id
JOIN dwd_category_info c ON g.category_id = c.category_id
JOIN dwd_brand_info b ON g.brand_id = b.brand_id
JOIN dwd_inventory i ON g.goods_id = i.goods_id
WHERE o.order_date = CURDATE()
GROUP BY g.goods_id, g.category_id, c.category_name, g.brand_id, b.brand_name, g.price, i.inventory;2. User Theme Wide Table: dws_user_detail
Scenario : user profile analysis, membership level management, marketing effectiveness evaluation.
user_id (VARCHAR(32)) – 用户唯一ID – example "U2001"
register_channel (VARCHAR(32)) – 注册渠道 – example "APP"
member_level (VARCHAR(16)) – 会员等级 – example "VIP3"
region_id (VARCHAR(16)) – 地域ID – example "R1001"
region_name (VARCHAR(64)) – 地域名称 – example "广东省"
device_type (VARCHAR(16)) – 设备类型 – example "Android"
active_days (INT) – 活跃天数(30天) – example 25
order_count (INT) – 下单次数(30天) – example 8
avg_order_amount (DECIMAL(10,2)) – 平均客单价 – example 350.50
lifecycle_value (DECIMAL(12,2)) – 生命周期价值 – example 2800.00
last_order_date (DATE) – 最后下单日期 – example "2023-09-05"
update_time (DATETIME) – 数据更新时间 – example "2023-09-07 15:30:00"
CREATE TABLE dws_user_detail (
user_id VARCHAR(32) COMMENT '用户唯一ID',
register_channel VARCHAR(32) COMMENT '注册渠道',
member_level VARCHAR(16) COMMENT '会员等级',
region_id VARCHAR(16) COMMENT '地域ID',
region_name VARCHAR(64) COMMENT '地域名称',
device_type VARCHAR(16) COMMENT '设备类型',
active_days INT COMMENT '活跃天数(30天)',
order_count INT COMMENT '下单次数(30天)',
avg_order_amount DECIMAL(10,2) COMMENT '平均客单价',
lifecycle_value DECIMAL(12,2) COMMENT '生命周期价值',
last_order_date DATE COMMENT '最后下单日期',
update_time DATETIME COMMENT '数据更新时间',
PRIMARY KEY (user_id)
) COMMENT '用户主题宽表'
DISTRIBUTED BY HASH(user_id) BUCKETS 32
PROPERTIES ("replication_num" = "1"); INSERT INTO dws_user_detail
SELECT
u.user_id,
u.register_channel,
u.member_level,
u.region_id,
r.region_name,
u.device_type,
COUNT(DISTINCT o.order_date) AS active_days,
COUNT(o.order_id) AS order_count,
AVG(o.amount) AS avg_order_amount,
SUM(o.amount) AS lifecycle_value,
MAX(o.order_date) AS last_order_date,
NOW() AS update_time
FROM dwd_user_info u
JOIN dwd_region_info r ON u.region_id = r.region_id
JOIN dwd_order_detail o ON u.user_id = o.user_id
WHERE o.order_date >= DATE_SUB(CURRENT_DATE,30)
GROUP BY u.user_id, u.register_channel, u.member_level, u.region_id, r.region_name, u.device_type;3. Order Theme Wide Table: dws_order_detail
Scenario : order fulfillment analysis, refund rate monitoring, supply‑chain optimization.
order_id (VARCHAR(32)) – 订单ID – example "O202309070001"
order_date (DATE) – 订单日期 – example "2023-09-07"
pay_method (VARCHAR(16)) – 支付方式 – example "微信支付"
delivery_channel (VARCHAR(32)) – 配送渠道 – example "顺丰速运"
order_status (VARCHAR(16)) – 订单状态 – example "已完成"
amount (DECIMAL(12,2)) – 订单金额 – example 399.00
refund_amount (DECIMAL(12,2)) – 退款金额 – example 0.00
refund_rate (DECIMAL(5,2)) – 退款率(%) – example 0.00
delivery_time (INT) – 配送时长(小时) – example 24
delivery_cost (DECIMAL(8,2)) – 物流成本 – example 10.00
update_time (DATETIME) – 数据更新时间 – example "2023-09-07 15:30:00"
CREATE TABLE dws_order_detail (
order_id VARCHAR(32) COMMENT '订单ID',
order_date DATE COMMENT '订单日期',
pay_method VARCHAR(16) COMMENT '支付方式',
delivery_channel VARCHAR(32) COMMENT '配送渠道',
order_status VARCHAR(16) COMMENT '订单状态',
amount DECIMAL(12,2) COMMENT '订单金额',
refund_amount DECIMAL(12,2) COMMENT '退款金额',
refund_rate DECIMAL(5,2) COMMENT '退款率(%)',
delivery_time INT COMMENT '配送时长(小时)',
delivery_cost DECIMAL(8,2) COMMENT '物流成本',
update_time DATETIME COMMENT '数据更新时间',
PRIMARY KEY (order_id)
) COMMENT '订单主题宽表'
DISTRIBUTED BY HASH(order_id) BUCKETS 32
PROPERTIES ("replication_num" = "1"); INSERT INTO dws_order_detail
SELECT
o.order_id,
o.order_date,
o.pay_method,
o.delivery_channel,
o.order_status,
o.amount,
COALESCE(r.refund_amount,0) AS refund_amount,
ROUND(COALESCE(r.refund_amount,0) * 100.0 / NULLIF(o.amount,0),2) AS refund_rate,
DATEDIFF(o.delivery_time, o.order_time) * 24 + EXTRACT(HOUR FROM (o.delivery_time - o.order_time)) AS delivery_time,
d.delivery_cost,
NOW() AS update_time
FROM dwd_order_fact o
LEFT JOIN dwd_refund_info r ON o.order_id = r.order_id
LEFT JOIN dwd_delivery_info d ON o.order_id = d.order_id
WHERE o.order_date = CURDATE();4. Region Theme Wide Table: dws_region_detail
Scenario : regional market analysis, store location evaluation, regional inventory allocation.
region_id (VARCHAR(16)) – 地域ID – example "R1001"
province (VARCHAR(32)) – 省份 – example "广东省"
city (VARCHAR(32)) – 城市 – example "广州市"
district (VARCHAR(32)) – 区域 – example "天河区"
store_id (VARCHAR(16)) – 门店ID – example "S1001"
sales_amount (DECIMAL(12,2)) – 销售额(日) – example 158000.00
customer_count (INT) – 客户数(日) – example 350
conversion_rate (DECIMAL(5,2)) – 转化率(%) – example 12.50
inventory_turnover (DECIMAL(5,2)) – 库存周转率 – example 5.20
avg_order_value (DECIMAL(10,2)) – 平均订单价值 – example 450.00
update_time (DATETIME) – 数据更新时间 – example "2023-09-07 15:30:00"
CREATE TABLE dws_region_detail (
region_id VARCHAR(16) COMMENT '地域ID',
province VARCHAR(32) COMMENT '省份',
city VARCHAR(32) COMMENT '城市',
district VARCHAR(32) COMMENT '区域',
store_id VARCHAR(16) COMMENT '门店ID',
sales_amount DECIMAL(12,2) COMMENT '销售额(日)',
customer_count INT COMMENT '客户数(日)',
conversion_rate DECIMAL(5,2) COMMENT '转化率(%)',
inventory_turnover DECIMAL(5,2) COMMENT '库存周转率',
avg_order_value DECIMAL(10,2) COMMENT '平均订单价值',
update_time DATETIME COMMENT '数据更新时间',
PRIMARY KEY (region_id, store_id)
) COMMENT '地区主题宽表'
DISTRIBUTED BY HASH(region_id) BUCKETS 32
PROPERTIES ("replication_num" = "1"); INSERT INTO dws_region_detail
SELECT
r.region_id,
r.province,
r.city,
r.district,
s.store_id,
SUM(o.amount) AS sales_amount,
COUNT(DISTINCT o.user_id) AS customer_count,
ROUND(COUNT(DISTINCT o.user_id) * 100.0 / NULLIF(COUNT(o.order_id),0),2) AS conversion_rate,
ROUND(SUM(o.quantity) * 30.0 / NULLIF(SUM(i.inventory),0),2) AS inventory_turnover,
AVG(o.amount) AS avg_order_value,
NOW() AS update_time
FROM dwd_order_detail o
JOIN dwd_region_info r ON o.region_id = r.region_id
JOIN dwd_store_info s ON r.region_id = s.region_id
JOIN dwd_inventory i ON o.goods_id = i.goods_id
WHERE o.order_date = CURDATE()
GROUP BY r.region_id, r.province, r.city, r.district, s.store_id;5. Channel Theme Wide Table: dws_channel_detail
Scenario : marketing channel effectiveness analysis, ROI calculation, media buying strategy optimization.
channel_id (VARCHAR(16)) – 渠道ID – example "C1001"
channel_name (VARCHAR(32)) – 渠道名称 – example "抖音信息流"
campaign_id (VARCHAR(16)) – 活动ID – example "CAM20230901"
campaign_name (VARCHAR(64)) – 活动名称 – example "9月新品促销"
impressions (BIGINT) – 曝光量 – example 50000
clicks (BIGINT) – 点击量 – example 2500
click_rate (DECIMAL(5,2)) – 点击率(%) – example 5.00
conversions (INT) – 转化量 – example 120
conversion_rate (DECIMAL(5,2)) – 转化率(%) – example 4.80
cac (DECIMAL(10,2)) – 获客成本 – example 83.33
update_time (DATETIME) – 数据更新时间 – example "2023-09-07 15:30:00"
CREATE TABLE dws_channel_detail (
channel_id VARCHAR(16) COMMENT '渠道ID',
channel_name VARCHAR(32) COMMENT '渠道名称',
campaign_id VARCHAR(16) COMMENT '活动ID',
campaign_name VARCHAR(64) COMMENT '活动名称',
impressions BIGINT COMMENT '曝光量',
clicks BIGINT COMMENT '点击量',
click_rate DECIMAL(5,2) COMMENT '点击率(%)',
conversions INT COMMENT '转化量',
conversion_rate DECIMAL(5,2) COMMENT '转化率(%)',
cac DECIMAL(10,2) COMMENT '获客成本',
update_time DATETIME COMMENT '数据更新时间',
PRIMARY KEY (channel_id, campaign_id)
) COMMENT '渠道主题宽表'
DISTRIBUTED BY HASH(channel_id) BUCKETS 32
PROPERTIES ("replication_num" = "1"); INSERT INTO dws_channel_detail
SELECT
c.channel_id,
c.channel_name,
cam.campaign_id,
cam.campaign_name,
SUM(a.impressions) AS impressions,
SUM(a.clicks) AS clicks,
ROUND(SUM(a.clicks) * 100.0 / NULLIF(SUM(a.impressions),0),2) AS click_rate,
COUNT(DISTINCT o.order_id) AS conversions,
ROUND(COUNT(DISTINCT o.order_id) * 100.0 / NULLIF(SUM(a.clicks),0),2) AS conversion_rate,
ROUND(SUM(a.cost) / NULLIF(COUNT(DISTINCT o.user_id),0),2) AS cac,
NOW() AS update_time
FROM dwd_ad_activity a
JOIN dwd_channel_info c ON a.channel_id = c.channel_id
JOIN dwd_campaign_info cam ON a.campaign_id = cam.campaign_id
LEFT JOIN dwd_order_detail o ON a.user_id = o.user_id
WHERE a.date = CURDATE()
GROUP BY c.channel_id, c.channel_name, cam.campaign_id, cam.campaign_name;6. Supplier Theme Wide Table: dws_supplier_detail
Scenario : supplier performance evaluation, procurement cost analysis, inventory turnover monitoring.
supplier_id (VARCHAR(16)) – 供应商ID – example "S2001"
supplier_name (VARCHAR(64)) – 供应商名称 – example "华强电子"
category_id (VARCHAR(16)) – 采购品类ID – example "C5001"
category_name (VARCHAR(64)) – 采购品类名称 – example "手机配件"
purchase_amount (DECIMAL(12,2)) – 采购金额(月) – example 125000.00
delivery_on_time_rate (DECIMAL(5,2)) – 交货准时率(%) – example 95.50
inventory_turnover (DECIMAL(5,2)) – 库存周转率 – example 6.20
purchase_cost (DECIMAL(10,2)) – 采购成本 – example 85.00
lead_time (INT) – 采购提前期(天) – example 15
update_time (DATETIME) – 数据更新时间 – example "2023-09-07 15:30:00"
CREATE TABLE dws_supplier_detail (
supplier_id VARCHAR(16) COMMENT '供应商ID',
supplier_name VARCHAR(64) COMMENT '供应商名称',
category_id VARCHAR(16) COMMENT '采购品类ID',
category_name VARCHAR(64) COMMENT '采购品类名称',
purchase_amount DECIMAL(12,2) COMMENT '采购金额(月)',
delivery_on_time_rate DECIMAL(5,2) COMMENT '交货准时率(%)',
inventory_turnover DECIMAL(5,2) COMMENT '库存周转率',
purchase_cost DECIMAL(10,2) COMMENT '采购成本',
lead_time INT COMMENT '采购提前期(天)',
update_time DATETIME COMMENT '数据更新时间',
PRIMARY KEY (supplier_id, category_id)
) COMMENT '供应链主题宽表'
DISTRIBUTED BY HASH(supplier_id) BUCKETS 32
PROPERTIES ("replication_num" = "1"); INSERT INTO dws_supplier_detail
SELECT
s.supplier_id,
s.supplier_name,
p.category_id,
c.category_name,
SUM(p.amount) AS purchase_amount,
ROUND(SUM(CASE WHEN p.delivery_date <= p.expected_delivery_date THEN 1 ELSE 0 END) * 100.0 / NULLIF(COUNT(p.purchase_id),0),2) AS delivery_on_time_rate,
ROUND(SUM(p.quantity) * 30.0 / NULLIF(SUM(i.inventory),0),2) AS inventory_turnover,
AVG(p.unit_price) AS purchase_cost,
AVG(DATEDIFF(p.delivery_date, p.purchase_date)) AS lead_time,
NOW() AS update_time
FROM dwd_purchase_order p
JOIN dwd_supplier_info s ON p.supplier_id = s.supplier_id
JOIN dwd_category_info c ON p.category_id = c.category_id
JOIN dwd_inventory i ON p.goods_id = i.goods_id
WHERE p.purchase_date >= DATE_SUB(CURRENT_DATE,30)
GROUP BY s.supplier_id, s.supplier_name, p.category_id, c.category_name;7. Service Theme Wide Table: dws_service_detail
Scenario : customer service response time analysis, complaint rate monitoring, satisfaction scoring.
service_id (VARCHAR(32)) – 服务ID – example "SVC202309070001"
service_channel (VARCHAR(16)) – 服务渠道 – example "在线客服"
service_type (VARCHAR(32)) – 问题类型 – example "订单退款"
agent_id (VARCHAR(16)) – 客服ID – example "A1001"
response_time (INT) – 响应时长(分钟) – example 5
resolution_rate (DECIMAL(5,2)) – 解决率(%) – example 92.30
complaint_count (INT) – 投诉次数 – example 2
satisfaction_score (DECIMAL(5,2)) – 满意度评分 – example 4.7
update_time (DATETIME) – 数据更新时间 – example "2023-09-07 15:30:00"
CREATE TABLE dws_service_detail (
service_id VARCHAR(32) COMMENT '服务ID',
service_channel VARCHAR(16) COMMENT '服务渠道',
service_type VARCHAR(32) COMMENT '问题类型',
agent_id VARCHAR(16) COMMENT '客服ID',
response_time INT COMMENT '响应时长(分钟)',
resolution_rate DECIMAL(5,2) COMMENT '解决率(%)',
complaint_count INT COMMENT '投诉次数',
satisfaction_score DECIMAL(5,2) COMMENT '满意度评分',
update_time DATETIME COMMENT '数据更新时间',
PRIMARY KEY (service_id)
) COMMENT '客户服务主题宽表'
DISTRIBUTED BY HASH(service_id) BUCKETS 32
PROPERTIES ("replication_num" = "1"); INSERT INTO dws_service_detail
SELECT
s.service_id,
s.service_channel,
s.service_type,
s.agent_id,
AVG(TIMESTAMPDIFF(MINUTE, s.create_time, s.response_time)) AS response_time,
ROUND(SUM(CASE WHEN s.status = '已解决' THEN 1 ELSE 0 END) * 100.0 / NULLIF(COUNT(*),0),2) AS resolution_rate,
COUNT(CASE WHEN s.complaint = 1 THEN 1 END) AS complaint_count,
AVG(s.satisfaction_score) AS satisfaction_score,
NOW() AS update_time
FROM dwd_service_record s
WHERE s.create_time >= DATE_SUB(CURRENT_DATE,7)
GROUP BY s.service_id, s.service_channel, s.service_type, s.agent_id;8. Finance Theme Wide Table: dws_finance_detail
Scenario : cost accounting, profit analysis, cash‑flow monitoring.
finance_id (VARCHAR(16)) – 财务ID – example "F20230907"
department (VARCHAR(32)) – 部门 – example "市场部"
project_id (VARCHAR(16)) – 项目ID – example "P20230901"
project_name (VARCHAR(64)) – 项目名称 – example "9月新品推广"
revenue (DECIMAL(12,2)) – 收入 – example 250000.00
cost (DECIMAL(12,2)) – 成本 – example 180000.00
gross_profit (DECIMAL(12,2)) – 毛利 – example 70000.00
gross_profit_rate (DECIMAL(5,2)) – 毛利率(%) – example 28.00
cash_flow (DECIMAL(12,2)) – 现金流 – example 65000.00
update_time (DATETIME) – 数据更新时间 – example "2023-09-07 15:30:00"
CREATE TABLE dws_finance_detail (
finance_id VARCHAR(16) COMMENT '财务ID',
department VARCHAR(32) COMMENT '部门',
project_id VARCHAR(16) COMMENT '项目ID',
project_name VARCHAR(64) COMMENT '项目名称',
revenue DECIMAL(12,2) COMMENT '收入',
cost DECIMAL(12,2) COMMENT '成本',
gross_profit DECIMAL(12,2) COMMENT '毛利',
gross_profit_rate DECIMAL(5,2) COMMENT '毛利率(%)',
cash_flow DECIMAL(12,2) COMMENT '现金流',
update_time DATETIME COMMENT '数据更新时间',
PRIMARY KEY (finance_id)
) COMMENT '财务主题宽表'
DISTRIBUTED BY HASH(finance_id) BUCKETS 32
PROPERTIES ("replication_num" = "1"); INSERT INTO dws_finance_detail
SELECT
f.finance_id,
f.department,
f.project_id,
p.project_name,
SUM(o.amount) AS revenue,
SUM(c.cost) AS cost,
SUM(o.amount) - SUM(c.cost) AS gross_profit,
ROUND((SUM(o.amount) - SUM(c.cost)) * 100.0 / NULLIF(SUM(o.amount),0),2) AS gross_profit_rate,
SUM(o.amount) - SUM(c.cost) AS cash_flow,
NOW() AS update_time
FROM dwd_order_fact o
JOIN dwd_project_info p ON o.project_id = p.project_id
JOIN dwd_cost_info c ON o.project_id = c.project_id
WHERE o.order_date >= DATE_SUB(CURRENT_DATE,30)
GROUP BY f.finance_id, f.department, f.project_id, p.project_name;9. Logistics Theme Wide Table: dws_logistics_detail
Scenario : logistics timeliness analysis, transportation cost optimization, exception order monitoring.
logistics_id (VARCHAR(32)) – 物流ID – example "L202309070001"
logistics_company (VARCHAR(32)) – 物流公司 – example "顺丰速运"
transport_type (VARCHAR(16)) – 运输方式 – example "快递"
region (VARCHAR(32)) – 区域 – example "华东地区"
delivery_time (INT) – 配送时长(小时) – example 24
delivery_cost (DECIMAL(8,2)) – 运输成本 – example 15.00
on_time_rate (DECIMAL(5,2)) – 准时率(%) – example 96.50
exception_order_count (INT) – 异常订单数 – example 2
update_time (DATETIME) – 数据更新时间 – example "2023-09-07 15:30:00"
CREATE TABLE dws_logistics_detail (
logistics_id VARCHAR(32) COMMENT '物流ID',
logistics_company VARCHAR(32) COMMENT '物流公司',
transport_type VARCHAR(16) COMMENT '运输方式',
region VARCHAR(32) COMMENT '区域',
delivery_time INT COMMENT '配送时长(小时)',
delivery_cost DECIMAL(8,2) COMMENT '运输成本',
on_time_rate DECIMAL(5,2) COMMENT '准时率(%)',
exception_order_count INT COMMENT '异常订单数',
update_time DATETIME COMMENT '数据更新时间',
PRIMARY KEY (logistics_id)
) COMMENT '物流主题宽表'
DISTRIBUTED BY HASH(logistics_id) BUCKETS 32
PROPERTIES ("replication_num" = "1"); INSERT INTO dws_logistics_detail
SELECT
l.logistics_id,
l.logistics_company,
l.transport_type,
l.region,
AVG(TIMESTAMPDIFF(HOUR, o.order_time, l.delivery_time)) AS delivery_time,
AVG(l.cost) AS delivery_cost,
ROUND(SUM(CASE WHEN l.delivery_time <= o.expected_delivery_time THEN 1 ELSE 0 END) * 100.0 / NULLIF(COUNT(*),0),2) AS on_time_rate,
COUNT(CASE WHEN l.delivery_time > o.expected_delivery_time THEN 1 END) AS exception_order_count,
NOW() AS update_time
FROM dwd_order_detail o
JOIN dwd_logistics_info l ON o.order_id = l.order_id
WHERE o.order_date = CURDATE()
GROUP BY l.logistics_id, l.logistics_company, l.transport_type, l.region;10. Data Quality Monitoring Wide Table: dws_data_quality
Scenario : data anomaly detection, ETL process monitoring, consistency checks.
table_name (VARCHAR(64)) – 表名 – example "dwd_order_detail"
column_name (VARCHAR(64)) – 字段名 – example "order_id"
data_type (VARCHAR(16)) – 数据类型 – example "VARCHAR"
null_rate (DECIMAL(5,2)) – 空值率(%) – example 0.50
duplicate_rate (DECIMAL(5,2)) – 重复率(%) – example 0.20
data_lag (INT) – 数据延迟(小时) – example 2
consistency_rate (DECIMAL(5,2)) – 一致性率(%) – example 99.80
check_time (DATETIME) – 检查时间 – example "2023-09-07 15:30:00"
status (VARCHAR(16)) – 状态 – example "正常"
CREATE TABLE dws_data_quality (
table_name VARCHAR(64) COMMENT '表名',
column_name VARCHAR(64) COMMENT '字段名',
data_type VARCHAR(16) COMMENT '数据类型',
null_rate DECIMAL(5,2) COMMENT '空值率(%)',
duplicate_rate DECIMAL(5,2) COMMENT '重复率(%)',
data_lag INT COMMENT '数据延迟(小时)',
consistency_rate DECIMAL(5,2) COMMENT '一致性率(%)',
check_time DATETIME COMMENT '检查时间',
status VARCHAR(16) COMMENT '状态',
PRIMARY KEY (table_name, column_name, check_time)
) COMMENT '数据质量监控宽表'
DISTRIBUTED BY HASH(table_name) BUCKETS 32
PROPERTIES ("replication_num" = "1"); INSERT INTO dws_data_quality
SELECT
'dwd_order_detail' AS table_name,
'order_id' AS column_name,
'VARCHAR' AS data_type,
ROUND(COUNT(*) * 100.0 / NULLIF((SELECT COUNT(*) FROM dwd_order_detail),0),2) AS null_rate,
ROUND(COUNT(*) * 100.0 / NULLIF((SELECT COUNT(*) FROM dwd_order_detail),0),2) AS duplicate_rate,
TIMESTAMPDIFF(HOUR, MAX(order_time), NOW()) AS data_lag,
ROUND(COUNT(*) * 100.0 / NULLIF((SELECT COUNT(*) FROM dwd_order_detail),0),2) AS consistency_rate,
NOW() AS check_time,
CASE WHEN ROUND(COUNT(*) * 100.0 / NULLIF((SELECT COUNT(*) FROM dwd_order_detail),0),2) < 5
AND TIMESTAMPDIFF(HOUR, MAX(order_time), NOW()) < 4 THEN '正常'
ELSE '异常'
END AS status
FROM dwd_order_detail
WHERE order_time < DATE_SUB(NOW(), INTERVAL 1 HOUR);DWS Wide‑Table Modeling Golden Rules
Theme‑driven : design each wide table around a clear business theme (e.g., product, user) and avoid “one‑size‑all” tables.
Lightweight aggregation : choose an aggregation granularity that balances detail and performance (neither too fine nor too coarse).
Dimension consistency : keep dimension definitions identical to those in the DWD layer to prevent mismatched metrics.
Performance first : use pre‑computation, appropriate indexing, and engines such as ClickHouse’s ReplacingMergeTree to accelerate queries.
Public‑service principle : ensure each wide table serves multiple downstream applications (reports, BI, dashboards) to reduce duplicated development.
Big Data Tech Team
Focuses on big data, data analysis, data warehousing, data middle platform, data science, Flink, AI and interview experience, side‑hustle earning and career planning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
