How to Build a Systematic Data Quality Model for Big Data Testing
This article presents a comprehensive data quality model derived from ISO 9126, maps its characteristics to data testing, outlines practical testing methods and tool requirements, and demonstrates how to integrate quality checks into the data development lifecycle for reliable, efficient big‑data pipelines.
Exploring a Data Quality Model
Data testing has become a critical focus for enterprises that increasingly value data quality; this article shares a systematic view of data testing, starting from a quality model and extending to practical testing methods and tool design.
1.1 ISO 9126 Transplanted to Data Quality
ISO 9126, a classic software‑quality framework, is examined as a reference for data quality. Its six primary attributes—functionality, usability, reliability, efficiency, maintainability, and portability—are mapped to data‑centric concerns such as timeliness, completeness, accuracy, and security.
Functionality : data must exist, be complete, and be accurate. Sub‑attributes include data timeliness, integrity, and precision.
Data existence → timeliness (produce data on schedule)
Data completeness → integrity (no missing or extra records)
Data accuracy → precision (numeric values are correct)
Usability : data should be understandable and needed, linking to product planning and communication.
Reliability , Efficiency , Maintainability , and Portability are discussed as root‑cause factors that ultimately affect functionality.
1.2 Optimising the Transplanted Model
The initial model is not orthogonal; several attributes overlap (e.g., reliability, efficiency, and maintainability all influence functionality). The model is refined by separating user‑visible qualities from developer‑visible ones and further distinguishing between symptom‑oriented (quick‑fix) and root‑cause (fundamental) traits.
2. Data‑Testing Methodology
2.1 Applying the Model to the Development Process
The quality model is overlaid on a typical development timeline, defining which attributes should be addressed at each phase (requirements, design, implementation, testing, release). User‑visible traits (usability) are driven by product management, while the remaining traits are handled during development and testing.
2.2 Test‑Focus Areas (Grabbers)
Functional testing is the primary entry point because it reflects user‑visible quality. Additional focus includes disaster‑recovery capability, which acts as a safety net for functional failures.
The overall testing formula is presented as:
Data Testing = Basic Tests (Functionality + Disaster‑Recovery) + Selective Evaluation (Efficiency || Reliability & Maintainability || Security)2.3 Functional‑Testing Methods
Functional testing mirrors traditional UI or API testing: construct input data, verify output data, and decide execution timing. Three key aspects are covered:
Input data construction : required for highly sensitive online scenarios; otherwise snapshot data may suffice.
Output data verification : checks timeliness, completeness, and accuracy using methods such as row counts, uniqueness checks, range validation, business‑logic bounds, and distribution analysis.
Test execution timing : self‑test, post‑submission, online data modification, and online data addition, each with appropriate trigger mechanisms (condition‑based or scheduled).
2.4 Disaster‑Recovery Evaluation
When data fails to generate, a quick fallback (e.g., switch to previous day’s data) is recommended, typically requiring server‑side support.
2.5 Other Attribute Evaluations
Efficiency is assessed via load testing and resource‑usage monitoring; reliability and maintainability focus on dependency checks and platform‑wide integration; security is noted as an open discussion topic.
3. Building a Data‑Testing Toolchain
Based on the methodology, the tool should support:
Input data construction and CR (code review) capabilities.
Functional‑testing of output data.
Low‑coupling extensions for other attributes.
Flexible trigger mechanisms (API, cron, message‑driven).
Rapid test‑case authoring, reuse, and orchestration.
End‑to‑end task management (generation, splitting, execution, result analysis).
Experience capture and reuse (knowledge‑base, recommendation).
The framework emphasizes API‑driven triggers, task orchestration, and indicator‑based test cases that can be translated into platform‑agnostic SQL, enabling both functional verification and broader quality monitoring.
4. Conclusion
The author reflects on systematizing personal experience into a reusable model, reports progress on prototype implementations, and invites collaboration with the broader community to further mature data‑testing practices.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
