Big Data 11 min read

How to Build a Robust Event Logging Quality System with Real‑Time Validation

This article outlines common event‑logging quality problems, a systematic registration and real‑time validation framework built on Flink, configurable rule syntax, explainable results, continuous monitoring, targeted optimizations, and an evaluation model that together form a comprehensive quality‑center for big‑data platforms.

Youzan Coder

Aug 23, 2019

How to Build a Robust Event Logging Quality System with Real‑Time Validation

Common Quality Issues

Logging is used to analyze user behavior, so high data quality is essential. Typical problems are:

Event duplication or loss caused by SDK bugs, network failures, or missing instrumentation.

Incorrect event parameters: missing required fields, null values, wrong data types, or invalid content.

Frontend coding errors that emit undefined or null.

Event pipeline breakage during upgrades or refactoring.

Quality Assurance Mechanisms

Accurate Registration

Developers define pages, components, and events according to a logging specification and register them in the logging platform. Registration captures:

Page/component type, business identifier, optional business ID.

Event name, identifier, description, associated page/component, and state.

Event parameters: name, type (int/string/float/list/map), required flag, and structural definition for complex types.

Real‑Time Validation

After registration, developers implement logging as specified. Real‑time validation checks log correctness and surfaces issues quickly.

Implemented in Flink with second‑level latency.

Validation rules are synchronized from the logging platform every minute.

Results are pushed to testing tools and persisted in Druid for downstream analysis.

Completeness & Extensibility

Validation rules must cover base specifications and business‑specific constraints while remaining adaptable. Example rule syntax (JSON):

{
  "compare": "length",
  "condition": ["sdk_type"],
  "in": ["iOS", "Android", "js"],
  "assert": true,
  "assert_fail": "ERROR",
  "value": 36,
  "key": "uuid",
  "fail_msg": "did/uuid invalid",
  "require": 1
}

This rule enforces that for iOS, Android, or JavaScript SDKs the uuid parameter must be present, be a string of length 36, and raise an ERROR otherwise.

Switches & Configurability

Validation severity can be adjusted via feature toggles or configuration files without code changes. Early stages may suppress alerts for unregistered events; later stages raise warnings or errors. New validation points can be introduced as WARNING, TEST_WARNING, or ERROR.

Explainable & Analyzable Results

Each validation outcome includes layer, parameter, error type, and cause, enabling root‑cause analysis. Aggregated results can be examined across dimensions (severity, time, event type) to identify hotspots.

Sample validation result format:

{
  "log_id": "571531737e29586094318d3bf64e9407",
  "timestamp": 1556174577000,
  "event_type": "click",
  "sdk_version": "0.7.7",
  "sdk_type": "js",
  "display_url": "url",
  "scope": "OVERALL",
  "field1": "",
  "field2": "",
  "status": "SUCCESS",
  "value": ""
}

Timed Monitoring

Beyond pre‑deployment checks, continuous monitoring tracks:

Compliance with logging standards.

Event loss and abnormal traffic spikes.

Business‑specific constraints.

Real‑time validation feeds minute‑level quality metrics; business dashboards provide hourly custom monitoring. Low‑false‑positive strategies include traffic thresholds and historical saturation analysis.

Targeted Optimization

Identified quality issues are fed back to product teams for remediation. Systemic SDK problems (e.g., high duplication or loss rates) trigger dedicated analysis and focused optimization projects following a “analysis → root cause → solution → tracking” workflow.

Evaluation Model

A weighted scoring model aggregates dimensions such as duplication rate, loss rate, and rule coverage. Weights are adjustable based on current priorities, and new dimensions can be added as the system evolves.

Quality Center

A unified dashboard presents daily and weekly quality summaries, enabling stakeholders to perceive overall health and prioritize fixes. Alerts are delivered via reports.

Current Status & Future Plans

After deploying the end‑to‑end quality framework, the platform achieved:

Quantifiable quality metrics instead of unknown status.

Centralized management and visibility of logging issues.

Resolution of the majority of low‑quality problems, turning logging into a reliable analysis foundation.

Planned improvements:

Richer, visualized validation configurations.

Integration of traffic prediction for smarter alerting and reduced false positives.

Refined evaluation model with dynamic weighting.

More comprehensive quality center with one‑click remediation actions.

Clear incentive/penalty mechanisms to motivate business owners to maintain high logging quality.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Big Data Flink event logging Data Quality real-time validation

Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.