Maintaining Wide Tables: Resource Impact, Evaluation, Granularity, Timeliness, and Automatic Expansion
The article explains how wide tables are maintained without excessive resource consumption, outlines criteria for deciding which metrics belong in a wide table, describes their granularity and timeliness considerations, and clarifies that they do not automatically expand when tracking points change.
Q1: How are wide tables maintained and do they consume excessive resources? Although wide tables may appear to use more resources than star‑schema tables, they do not in practice because only necessary metrics are included; low‑frequency, highly personalized metrics should be excluded. Wide tables are coarse‑grained, atomic metric collections that are flexible, have lower maintenance cost, and suit fast‑changing internet businesses, but are unsuitable for domains like finance.
Q2: How to evaluate whether a metric should be placed in a wide table? Decision is based on metric usage frequency and scope: metrics with few users, low frequency, or strong personalization are not suitable, while common metrics such as daily active users, homepage PV, and content exposure PV are appropriate; niche metrics like a floating‑ball click PV are excluded.
Q3: What is the granularity of a user wide table? A wide table has no primary key and acts as a “bucket” of common metrics; the user master table provides the user‑ID primary key. Core metrics (e.g., DAU, retention, revenue, consumption) are aggregated in the wide table, whereas fine‑grained metrics must be retrieved from detailed user‑behavior tables.
Q4: How to ensure the timeliness of a wide table? Timeliness depends on source data ingestion efficiency and SQL performance. High‑complexity data can be processed in parallel rather than via upstream/downstream dependencies, and big‑data platforms typically assign higher task priorities to accelerate processing.
Q5: When tracking points change, does the data‑warehouse wide table automatically expand? No. Wide tables are defined based on metric definitions; tracking events are merely factors for those metrics, and a single metric may involve multiple events, so the table does not auto‑extend.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.