Big Data 14 min read

Building RAP: iQIYI’s Real‑Time Big Data Analytics Platform with Druid, Spark & Flink

The article details iQIYI’s RAP platform, describing its real‑time analytics requirements, architectural evolution from RAP 1.x to 2.x, core design steps, integration of Druid, Spark, Flink, and KIS, and showcases business use cases such as membership monitoring, recommendation evaluation, and smart‑TV alerting.

dbaplus Community
dbaplus Community
dbaplus Community
Building RAP: iQIYI’s Real‑Time Big Data Analytics Platform with Druid, Spark & Flink

Real‑Time Analysis Requirements

Since 2010 iQIYI’s data volume grew beyond the capabilities of its Hive + MySQL OLAP warehouse, prompting the need for a platform that can deliver minute‑level latency, high‑throughput queries, and flexible multidimensional analysis.

Choosing the right OLAP engine was difficult because many options existed, each with trade‑offs.

Development cost was high: users had to write Spark or Flink jobs and build front‑end reports.

Data latency was poor, ranging from tens of minutes to days.

Maintenance was cumbersome; any change in data sources required reworking the entire pipeline.

RAP (Realtime Analysis Platform) was created to address these pain points by providing a web‑wizard that configures data ingestion, processing, aggregation, reporting, and alerting with only a few clicks.

Architecture Evolution

RAP 1.x

OLAP Selection – After evaluating several engines, Druid was chosen as the underlying OLAP store because it is open‑source, optimized for time‑series data, and offers sub‑second query latency.

Product Design – RAP abstracts the real‑time analytics workflow into five steps: data ingestion, data processing, aggregation, report configuration, and real‑time alerting. Each step is guided by a web wizard.

Data Ingestion – Users select from four source types (user data, service logs, monitoring data, other Kafka sources) via a dropdown; no cluster IPs are required.

Data Processing – RAP translates user‑defined rules into a proprietary StreamingSQL, generating Spark Streaming jobs automatically. Built‑in functions include IP‑to‑province/city/operator conversion.

Aggregation – Users define dimensions and measures (count, distinct count, sum, etc.) through a UI; RAP automatically generates optimized Druid queries and tunes parameters such as task.partition, windowPeriod, and queryGranularity.

Report Configuration – Based on the OLAP model, RAP creates Druid queries, renders visual reports, and suggests appropriate query granularity.

Real‑Time Alerting – Thresholds, YoY/MoM comparisons, and delayed evaluation windows can be configured to reduce false alarms.

These steps reduce end‑to‑end latency from days to about 30 minutes.

RAP 2.x Enhancements

Feedback from RAP 1.x revealed data loss during task restarts, limited Kafka version support, and deprecated Tranquility ingestion.

Kafka Indexing Service (KIS) Integration – KIS provides exactly‑once ingestion from Kafka to Druid. A Supervisor process on the Overlord node manages task lifecycles. RAP 2.x can automatically configure KIS when the Kafka source version is ≥ 0.10.x, otherwise it routes processed data through a shared Kafka cluster.

Flink Compute Engine – Adding Flink enables second‑level processing and maintains exactly‑once semantics, further reducing latency.

Enhanced Diagnostics – RAP 2.x introduces monitoring for stream task latency, real‑time ingestion, and error sampling, allowing operators to pinpoint bottlenecks and data quality issues quickly.

Business Applications

Membership Log Monitoring – Processes hundreds of billions of log events daily, delivering minute‑level alerts and generating over 700 real‑time reports, improving fault‑resolution speed by 80 %.

Recommendation Algorithm Monitoring – RAP tracks clicks, views, and watch time to evaluate bucketed recommendation algorithms, enabling a switch within 30 minutes to improve user experience.

Smart‑TV Real‑Time Alerting – Multidimensional analysis of playback errors (client version, server IP, city, etc.) supports a 5‑minute alert loop and root‑cause tracing.

Future Directions

RAP will continue to deepen monitoring of the analysis pipeline, add finer‑grained diagnostics, improve resource utilization, and strengthen exactly‑once guarantees, ultimately delivering richer analytics such as retention and intelligent analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

FlinkReal-time analyticsOLAPDruidSparkiQIYI
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.