Big Data 12 min read

Design and Evolution of iQIYI's Real-Time Analytics Platform (RAP)

The article details iQIYI's Real-Time Analysis Platform (RAP), describing its motivation, architecture evolution from RAP 1.x to 2.x, OLAP engine selection, product design workflow, integration of Druid KIS and Flink, enhanced diagnostics, and real-world applications in membership monitoring, recommendation evaluation, and smart TV alerting.

DataFunTalk
DataFunTalk
DataFunTalk
Design and Evolution of iQIYI's Real-Time Analytics Platform (RAP)

In the era of information explosion, iQIYI recognized the need for a high‑performance, accurate, and agile big‑data real‑time analysis platform, leading to the development of RAP (Realtime Analysis Platform) based on Apache Druid combined with Spark and Flink.

RAP provides minute‑level latency OLAP capabilities through a web‑wizard that automatically builds multidimensional models, generates visual reports, and offers APIs for integration with various business lines such as membership, recommendation, and BI, supporting hundreds of streaming tasks and thousands of analytical reports.

The first generation (RAP 1.x) used Druid as the underlying OLAP engine to meet the low‑latency, high‑throughput requirements of time‑series data, while the second generation (RAP 2.x) added support for Kafka Indexing Service (KIS), Flink processing, and advanced Druid features such as HLL Sketch for distinct counting.

RAP abstracts the real‑time analysis workflow into five steps—data ingestion, data processing, aggregation analysis, report configuration, and real‑time alerting—each guided by a web‑based wizard that eliminates the need for manual Spark/Flink code.

Data ingestion supports four Kafka‑based source types; data processing translates user‑defined rules into iQIYI’s StreamingSQL, generating Spark Streaming jobs (and optionally Flink jobs) without user‑written code; aggregation defines dimensions and measures, automatically optimizing Druid query parameters.

Report configuration automatically creates Druid queries and visualizations, while the alerting module allows threshold‑based and comparative alerts with configurable delay handling to reduce false positives.

RAP 2.x also introduces comprehensive diagnostics, including stream‑task latency charts, real‑time ingestion monitoring, and error sampling, enabling rapid troubleshooting of report or data‑pipeline issues.

Business applications include: (1) Membership log monitoring processing billions of events daily with minute‑level alerts, (2) Real‑time recommendation algorithm effectiveness tracking, reducing algorithm rollout verification from a day to 30 minutes, and (3) Smart‑TV playback fault detection with 5‑minute alerts and multidimensional root‑cause analysis.

Future plans aim to deepen monitoring granularity, improve resource utilization, and enhance Exactly‑Once guarantees across the entire real‑time analysis chain.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

FlinkOLAPDruidSparkiQIYI
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.