How to Build a Robust Data Quality Assurance Strategy for Large-Scale Data Platforms
This article outlines a comprehensive data quality assurance framework for a massive reporting platform, covering the data pipeline architecture, detailed testing methods for timeliness, completeness, and accuracy, as well as application‑level checks, downgrade and backup strategies, and future automation plans.
1. Data Pipeline Overview
The Youzan Data Report Center offers over 30 pages, 100+ reports, and 400+ metrics, generating thousands of daily tasks and tables. Its architecture consists of four layers: application service, data gateway, application storage, and data warehouse, with job development and metadata management providing core computation, scheduling, and query capabilities. The most critical components for quality control are the data warehouse and data application layers, which change frequently and pose higher risk.
2. Data‑Layer Testing
2.1 Timeliness
Timeliness ensures data is produced on schedule. Key factors are scheduled execution time, task priority, and data deadline. Strategies include:
Monitor offline task completion via the job development platform’s alert system (email, enterprise WeChat, phone).
Check total table rows or partition rows through an automated interface that returns zero when no data is produced.
Track failure and retry counts; repeated failures trigger additional alerts.
2.2 Completeness
Completeness evaluates whether data is neither missing nor excessive, split into table‑level and field‑level checks.
Table‑level:
Verify total row count or table size remains stable; sudden drops indicate issues.
Compare daily partition row counts/sizes; large deviations suggest problems.
Field‑level:
Uniqueness: ensure primary keys or unique fields have no duplicates (e.g.,
select count(order_no), count(distinct order_no) from ods.xx_order).
Non‑null: confirm important columns contain no nulls (e.g., select count(*) from ods.xx_order where order_no is null).
Enum validation: check that enum fields contain only expected values (e.g., select shop_type from ods.xx_order group by shop_type).
Format validation: verify data formats such as dates (e.g., yyyymmdd).
These checks are typically applied at the ODS layer to catch issues early.
2.3 Accuracy
Accuracy tests are more exploratory and include:
Self‑checks: simple rules like non‑negative payment amounts (
select count(pay_price) from dw.dws_xx_order where par = 20211025 and pay_price < 0).
Horizontal intra‑table comparison: ensure logical relationships between columns (e.g., order count ≥ distinct buyer count).
Horizontal inter‑table comparison: compare equivalent metrics across tables, storage systems, or external systems (e.g., payment amounts in MySQL vs. Kylin).
Vertical cross‑layer comparison: verify that downstream aggregates match upstream details (e.g., DW order detail vs. DM aggregated order count).
Code review focus: join types, filter conditions, metric definitions, insert idempotency, column order consistency, and overwrite handling.
3. Application‑Layer Testing
Beyond generic front‑end and back‑end tests, data‑application testing adds specific concerns:
Degradation strategy: When a new data table is added, decide whether to display a “blue bar” indicating pending offline data (current time + 2 hours).
Handle division‑by‑zero scenarios; the front‑end shows “‑” for such cases.
Primary/backup strategy: Ensure dual‑write works correctly and that data sources can be switched without loss.
Data security: Test permission controls for both horizontal (same‑level) and vertical (cross‑level) data access.
4. Future Planning
Current tools cover about 50 % of manual testing; upcoming enhancements aim to support SQL functions (SUM, COUNT, MAX, MIN) to raise coverage to 75 % and further reduce effort.
Plans include:
Integrating shape‑check and data‑comparison tools into an online inspection platform.
Automating static SQL analysis for insert‑into handling, join uniqueness, and column order checks, and exposing these capabilities to other business lines.
The article concludes with an invitation to join the Youzan data‑risk testing team to help implement these initiatives.
Youzan Coder
Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
