Evolution of iQIYI's Real-Time Big Data Ecosystem
iQIYI transformed its data infrastructure from a traditional offline T+1 model to a comprehensive real‑time ecosystem—leveraging Kafka, Flink, a three‑layer Stream Data Service Platform, the Talos drag‑and‑drop pipeline, and a Druid‑based analytics platform—to enable low‑latency monitoring, personalized recommendations, ad targeting, and continuous machine‑learning workflows while planning future stream‑batch integration and lake‑warehouse convergence.
Data is the fundamental production factor in the Internet era, and its value for companies can be categorized into three main areas: guiding decisions through BI reports, optimizing user experience and monetization via personalization and ads, and supporting business monitoring such as dashboards and risk control.
The value of data decays over time, making the traditional offline T+1 big‑data model insufficient for emerging business needs. iQIYI began adopting real‑time technologies in 2014 (Kafka, Storm, Spark Streaming) and introduced Apache Flink in 2017, eventually building a complete real‑time data pipeline that handles ingestion, processing, distribution, analysis, and application, supporting peaks of over 30 million QPS for events like live broadcasts.
Early real‑time ETL relied on Flink jobs that parsed logs, binlogs, and other sources into JSON key‑value records stored in Kafka. While flexible, this approach suffered from massive duplicate data production, siloed development, poor data governance, and instability under Flink/Kafka failures.
To address these issues, iQIYI created a Stream Data Service Platform organized into three layers: an operations layer for managing Kafka/Pulsar/RocketMQ clusters, a data‑management layer for metadata, lineage, quality monitoring, and HA switching, and a client‑SDK layer that abstracts Kafka details for Flink, Spark, and Java users, providing automatic registration, heartbeat‑driven address discovery, and seamless failover.
The real‑time data warehouse differs from traditional offline warehouses in three key aspects: horizontal splitting of streams to avoid wasteful consumption of Kafka bandwidth and Flink resources; dimension degeneration by embedding frequently used dimensions directly into the fact stream to reduce costly stream‑join operations; and shortening the processing chain to keep the number of Kafka hops within four layers, often pushing aggregation to OLAP stores such as Druid or ClickHouse.
iQIYI also built the Talos platform, a real‑time data production and distribution system that lets users design processing logic via drag‑and‑drop DAGs or SQL operators, with visual debugging of intermediate data. This eliminates the need for users to write Flink code or manage job deployment.
For downstream analytics, iQIYI developed the Real‑Time Analytics Platform (RAP) based on Druid and Spark/Flink. RAP ingests Kafka streams directly via Druid’s Kafka Index Service, offering web‑guided OLAP model creation, ad‑hoc analysis, alerting, and Grafana visualizations.
Real‑time capabilities are applied across iQIYI’s business lines in three typical scenarios: real‑time monitoring (dashboards, alerts, log search), real‑time data analysis (operational metrics, content recommendation, advertising), and online learning/training for machine‑learning models.
Future directions include stream‑batch integration to replace traditional MapReduce/Spark pipelines, lake‑warehouse convergence with Iceberg, shifting some ETL logic downstream to ELT on OLAP engines, and a seamless BI+AI pipeline that connects real‑time data production to feature generation, model training, and online inference.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
