Big Data 28 min read

How to Build a Systematic Data Quality Model for Big Data Testing

This article presents a comprehensive data quality model derived from ISO 9126, maps its characteristics to data testing, outlines practical testing methods and tool requirements, and demonstrates how to integrate quality checks into the data development lifecycle for reliable, efficient big‑data pipelines.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How to Build a Systematic Data Quality Model for Big Data Testing

Exploring a Data Quality Model

Data testing has become a critical focus for enterprises that increasingly value data quality; this article shares a systematic view of data testing, starting from a quality model and extending to practical testing methods and tool design.

1.1 ISO 9126 Transplanted to Data Quality

ISO 9126, a classic software‑quality framework, is examined as a reference for data quality. Its six primary attributes—functionality, usability, reliability, efficiency, maintainability, and portability—are mapped to data‑centric concerns such as timeliness, completeness, accuracy, and security.

Functionality : data must exist, be complete, and be accurate. Sub‑attributes include data timeliness, integrity, and precision.

Data existence → timeliness (produce data on schedule)

Data completeness → integrity (no missing or extra records)

Data accuracy → precision (numeric values are correct)

Usability : data should be understandable and needed, linking to product planning and communication.

Reliability , Efficiency , Maintainability , and Portability are discussed as root‑cause factors that ultimately affect functionality.

Data quality model overview
Data quality model overview

1.2 Optimising the Transplanted Model

The initial model is not orthogonal; several attributes overlap (e.g., reliability, efficiency, and maintainability all influence functionality). The model is refined by separating user‑visible qualities from developer‑visible ones and further distinguishing between symptom‑oriented (quick‑fix) and root‑cause (fundamental) traits.

Optimised data quality model
Optimised data quality model

2. Data‑Testing Methodology

2.1 Applying the Model to the Development Process

The quality model is overlaid on a typical development timeline, defining which attributes should be addressed at each phase (requirements, design, implementation, testing, release). User‑visible traits (usability) are driven by product management, while the remaining traits are handled during development and testing.

Quality model mapped to development phases
Quality model mapped to development phases

2.2 Test‑Focus Areas (Grabbers)

Functional testing is the primary entry point because it reflects user‑visible quality. Additional focus includes disaster‑recovery capability, which acts as a safety net for functional failures.

The overall testing formula is presented as:

Data Testing = Basic Tests (Functionality + Disaster‑Recovery) + Selective Evaluation (Efficiency || Reliability & Maintainability || Security)

2.3 Functional‑Testing Methods

Functional testing mirrors traditional UI or API testing: construct input data, verify output data, and decide execution timing. Three key aspects are covered:

Input data construction : required for highly sensitive online scenarios; otherwise snapshot data may suffice.

Output data verification : checks timeliness, completeness, and accuracy using methods such as row counts, uniqueness checks, range validation, business‑logic bounds, and distribution analysis.

Test execution timing : self‑test, post‑submission, online data modification, and online data addition, each with appropriate trigger mechanisms (condition‑based or scheduled).

2.4 Disaster‑Recovery Evaluation

When data fails to generate, a quick fallback (e.g., switch to previous day’s data) is recommended, typically requiring server‑side support.

2.5 Other Attribute Evaluations

Efficiency is assessed via load testing and resource‑usage monitoring; reliability and maintainability focus on dependency checks and platform‑wide integration; security is noted as an open discussion topic.

3. Building a Data‑Testing Toolchain

Based on the methodology, the tool should support:

Input data construction and CR (code review) capabilities.

Functional‑testing of output data.

Low‑coupling extensions for other attributes.

Flexible trigger mechanisms (API, cron, message‑driven).

Rapid test‑case authoring, reuse, and orchestration.

End‑to‑end task management (generation, splitting, execution, result analysis).

Experience capture and reuse (knowledge‑base, recommendation).

Proposed data‑testing framework architecture
Proposed data‑testing framework architecture

The framework emphasizes API‑driven triggers, task orchestration, and indicator‑based test cases that can be translated into platform‑agnostic SQL, enabling both functional verification and broader quality monitoring.

4. Conclusion

The author reflects on systematizing personal experience into a reusable model, reports progress on prototype implementations, and invites collaboration with the broader community to further mature data‑testing practices.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

test automationData QualityData ReliabilityTesting Methodologybig data testingISO 9126
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.