Big Data 9 min read

Data Warehouse Data Quality Measurement Standards

The article outlines four key dimensions for evaluating data warehouse data quality—correctness, completeness, timeliness, and consistency—explains common consistency issues such as differing metric values across models, cross‑dimensional aggregations, and real‑time versus batch calculations, and proposes organizational and review mechanisms to mitigate these problems.

Big Data Technology & Architecture

Jan 18, 2022

Data Warehouse Data Quality Measurement Standards

Data quality in data warehouses is commonly assessed using four dimensions: correctness, completeness, timeliness, and consistency.

Correctness refers to the trustworthiness of a metric; it must be validated through detailed data comparison, cross‑dimensional checks, real‑time vs offline verification, and DQC rules such as uniqueness or range checks.

Completeness covers both model‑level completeness (absence of nulls or data loss) and the richness of metrics needed for business decisions.

Timeliness concerns the promptness of data production, e.g., real‑time data within one minute latency and offline data generated by a fixed daily schedule, with priority scheduling for critical tasks.

Consistency means the same business metric should yield identical values across different scenarios (systems, models, real‑time/offline). Inconsistencies often arise from divergent calculation logic, data sources, or naming conventions.

The article further examines consistency problems in three categories: (1) inconsistent metric values across different models, (2) mismatched aggregation results across cross‑dimensional data, and (3) discrepancies between real‑time and offline metrics, especially in Lambda architectures.

To address these issues, the author suggests improving organizational structure by establishing a data‑mid‑platform team responsible for shared metrics, and implementing rigorous demand and model review processes to ensure clear definitions, standardized naming, and proper layering of tables.

Additional mitigation strategies include pre‑emptive real‑time vs offline metric comparison, and adopting stream‑batch unified architectures (e.g., OLAP engines like Hologres or Doris) to harmonize computation and storage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Quality Data Warehouse consistency Data Governance

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.