Big Data 19 min read

Data Quality Management: Expectations, Measurement, Assurance, and Operation

The article outlines a complete data‑quality‑management framework that first captures business expectations, then translates them into basic and personalized measurement rules, defines four assurance approaches for handling violations, and scales operation with indicators, tooling, and metrics to continuously improve data quality across the lifecycle.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Data Quality Management: Expectations, Measurement, Assurance, and Operation

The article presents a comprehensive framework for data quality management, structured into four main parts: data quality expectations, measurement, assurance, and operation.

1. Data Quality Expectations – Emphasizes the need to understand the business side’s quality expectations before defining standards. It proposes three groups of questions to elicit expectations during the requirement pre‑review stage: (1) acquiring quality expectations, (2) assessing potential risks, and (3) clarifying business knowledge. The author stresses that asking directly about monitoring or guarantees can be ineffective, so a structured questionnaire is recommended.

2. Data Quality Measurement – Describes how to design measurement rules based on the previously gathered expectations. Rules are divided into basic rules (e.g., row count = 0, primary‑key duplication) that are generally provided by the platform, and personalized rules that are tailored to specific data sets. An example using a “business object exposure and click log” illustrates how to extract expectations and translate them into rule sets. Measurement timing is categorized into three forms: initialization, acceptance, and production measurement, each linked to different stages of the data‑development lifecycle.

3. Data Quality Assurance – Discusses how to act when measurement results reveal issues. The author outlines four assurance categories: (1) Process assurance (e.g., data admission and change‑release procedures), (2) Institutional assurance (responsibility and grading systems), (3) Monitoring assurance (continuous observation, measurement, prompting, and correction), and (4) Resource assurance (both physical resources such as CPU/memory and human‑resource allocation). Real‑world cases, such as fixing a missing‑value issue in the “f5” attribute, demonstrate how to trace the root cause and apply corrective actions.

4. Data Quality Operation – Focuses on scaling assurance to large volumes of data. It defines operational goals (reducing incident loss and improving assurance efficiency) and presents an indicator system that links governance objectives, strategies, and evaluations. The article details how to set standards, design execution rules, and provide tooling (monitoring, baseline management, DQC tools, etc.). It also explains how to evaluate the effectiveness of these rules through metrics such as alarm effectiveness, coverage, and response time, and shows a case where weekly alarm defects were reduced from 2000+ to under 100.

Overall, the piece offers a practical, theory‑driven guide for data engineers and platform teams to systematically improve data quality throughout the data lifecycle.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

OperationsMetricsquality assuranceData QualityData Governance
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.