Big Data 11 min read

iQIYI Data Link Governance: Offline and Real‑time Pipeline Management and Exploration

This article presents iQIYI’s comprehensive data link governance practice, covering the motivations, offline and real‑time pipeline governance strategies, monitoring mechanisms, data lineage, and exploratory work such as intelligent attribution and field‑level lineage to improve data accuracy, timeliness, and reliability.

DataFunSummit
DataFunSummit
DataFunSummit
iQIYI Data Link Governance: Offline and Real‑time Pipeline Management and Exploration

Introduction Data has become a new production factor and a critical asset for enterprises; managing its full lifecycle is essential for data applications, security, and privacy. The article introduces iQIYI’s data link governance practice, including offline link governance for accuracy and timeliness, real‑time link governance for rapid anomaly detection, and exploratory work such as intelligent attribution and field lineage.

Problem and Goals iQIYI faces long data chains across multiple business lines, making fault impact broad. Issues include downstream nodes unable to locate fault sources, difficulty assessing impact scope, and time‑consuming data repairs involving many teams. The goal is to ensure data accuracy, timeliness, and rapid fault isolation.

Offline Link Governance The offline link focuses on two objectives: accuracy (ensuring data correctness and availability) and timeliness (producing data before agreed deadlines). Common problems are data delay, anomalies, and inconsistency. Governance measures include dual‑cluster HA for stability, task monitoring (delay and failure), comprehensive data monitoring (source to report layers), and a data‑link dashboard showing lineage and node status.

Key Offline Practices 1. Task Stability – Dual‑cluster HA : Deploy two independent clusters for core tasks; results are written to an HA service before final output, dramatically reducing latency and fault rates. 2. Data Monitoring : Monitor core metrics (Pingback layer, report layer) using anomaly‑detection rules; generate tickets for anomalies; models include threshold detection, correlation analysis, Prophet, box‑plot, Gaussian, and period‑over‑period analysis. 3. Offline Data Lineage : Maintain table‑level and task‑level lineage to locate upstream sources of downstream delays and to automate data re‑run and notification. 4. Data Link Dashboard : Visualize core business nodes, their health status (green: on‑time, red: delayed, etc.), and allow drill‑down into node details.

Real‑time Link Governance Real‑time governance also adopts dual‑cluster HA for stability and adds three monitoring dimensions: traffic monitoring (flow break, spikes, consumption delay, message backlog), business metric monitoring (anti‑fraud, hotness, user growth), and service monitoring (identifying underlying service issues).

Real‑time Monitoring Details Each node has primary/backup monitoring covering Kafka clusters (breaks, spikes), main tasks (message backlog, consumption delay), and business layer (primary‑backup differences). The monitoring dashboard highlights three states: critical (red) for flow break or spikes, warning (yellow) for task‑level delays, and normal (green).

Exploration Future work includes intelligent attribution—using data graphs, anomaly reports, and expert knowledge to automatically pinpoint root causes (e.g., channel or version issues)—and field‑level lineage, which aims to map each downstream field to its upstream source for precise impact assessment and automated notifications.

Conclusion The presentation summarizes iQIYI’s data link governance framework, demonstrating how offline and real‑time strategies, comprehensive monitoring, and exploratory techniques together enhance data quality, reduce fault resolution time, and support reliable data‑driven decision making.

data pipelinedata lineageReal-time MonitoringOffline Processingdata governanceiQIYI
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.