Big Data 6 min read

How Qualitis Ensures High‑Availability Data Quality Monitoring on Big Data Platforms

Qualitis is a big‑data‑platform‑based data‑quality‑management service that defines, detects, and reports data‑set quality issues, featuring idempotent backend services, load‑balanced high‑availability, Zookeeper‑coordinated process synchronization, thread‑pool throttling, and clearly separated internal and external APIs.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
How Qualitis Ensures High‑Availability Data Quality Monitoring on Big Data Platforms

Introduction

Data quality monitoring is a critical step in big‑data processing, providing the necessary support for data services, analytics and mining.

Project Overview

The document proposes Qualitis, a data‑quality‑management service built on a big‑data platform, offering a unified workflow to define, detect, and report data‑set quality issues in a timely manner.

Glossary

Project : a collection of rules that determines alert recipients and severity; it is a unit of task scheduling.

Rule : definition of a data‑quality model for a data source; it decides whether an alert is triggered and serves as the basic unit for task scheduling.

Application : a data‑quality‑checking task; executing the task yields quality verification results.

Overall Design

Architecture

Gray‑Release Design

Because each Qualitis backend service is idempotent, gray‑release is achieved by isolating a single backend instance so that it no longer receives user requests.

High‑Availability and Performance

Qualitis services are idempotent and can be deployed in multiple instances behind a load balancer to achieve both high availability and performance improvement.

Additional performance ideas (not yet implemented) include query caching using a distributed cache to reduce database load and accelerate response times.

Multi‑Thread Synchronization

Process synchronization is required because multiple Qualitis instances may simultaneously refresh monitoring task states. Qualitis uses Zookeeper to coordinate processes; instances compete to create an ephemeral node, and the winner becomes the Monitor responsible for task status updates.

Thread Throttling

When monitoring tasks submit to Hive Metastore, high request volume can overload the metastore. Qualitis employs a thread‑pool throttling mechanism: if no thread is available, the task waits until one is obtained before connecting to the metastore.

Module Design

Module Diagram

Use‑Case Diagram

API Design

Internal APIs

Two categories: Administrator APIs (/qualitis/api/v1/admin/*) and User APIs (/qualitis/api/v1/projector/*), separating permissions.

External APIs

External endpoint pattern: /qualitis/outer/api/v1/*. Calls must include the following query parameters:

app_id (string): system‑assigned application identifier.

timestamp (string): millisecond‑level timestamp, valid for 7 days.

nonce (string): random string of length 5.

signature (string): MD5(md5(appId + nonce + timestamp) + appToken), 32‑character lowercase hash.

app_id and appToken must be granted by an administrator.

System Engineering Structure

The system consists of two layers: a Web layer (Controller and Service) that exposes services, and a Core layer containing core business logic and storage components.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

architectureBig Datahigh availabilityData Qualityapi-designQualitis
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.