Real-Time OLAP with Apache Druid at iQiyi: Architecture, Optimizations, and Platform Practices
iQiyi replaced its offline OLAP stack with Apache Druid, leveraging its real‑time, multi‑dimensional engine and a five‑component architecture, while solving coordinator and overlord bottlenecks, optimizing indexing resources, adopting KIS mode, and building the self‑service RAP platform that now powers thousands of low‑latency dashboards.
In recent years, big‑data technologies have been widely adopted across industries, but the growing volume and timeliness requirements of data pose significant challenges for real‑time analytics. Numerous OLAP engines have emerged, each with its own strengths and weaknesses, making the selection of a suitable engine critical for faster and more efficient data analysis.
iQiyi’s big‑data OLAP service originally relied on offline solutions (Hive+MySQL, HBase). Since 2016, Kylin and Impala were introduced for fixed reports and ad‑hoc queries, and from 2018 Kudu and Apache Druid were added to meet real‑time analysis needs.
Before adopting Druid, several business scenarios could not be satisfied by offline analysis, such as advertisers needing instant feedback to adjust campaigns and model engineers requiring near‑real‑time A/B test results. Traditional approaches included:
Offline analysis with Hive/Impala/Kylin (limited timeliness, dimension explosion in Kylin).
Real‑time analysis using Elasticsearch or OpenTSDB (slow aggregation).
Streaming jobs (Spark/Flink) that required new code for every dimension change.
Kudu + Impala (high memory and partition costs).
Lambda architecture, which required maintaining both batch and stream pipelines.
After evaluating mainstream OLAP engines, iQiyi selected Apache Druid for its real‑time, multi‑dimensional analytics capabilities.
Apache Druid Overview
Druid is an open‑source system designed for massive event‑stream storage and low‑latency analytics. Its key features are:
Real‑time visibility: queries become available within minutes of ingestion.
Interactive queries: sub‑second latency using in‑memory and parallel computation.
Flexible dimensions: dozens of dimensions can be combined arbitrarily.
Easy changes: index configuration updates take effect immediately.
Streaming‑batch integration: the KIS mode provides exactly‑once semantics.
The Druid architecture consists of five main components:
MiddleManager : indexing node that ingests messages, converts them to columnar format, performs roll‑up, and persists immutable segments to HDFS.
Historical : loads segments locally and handles the majority of query processing.
Broker : query router that splits queries into real‑time and offline parts and merges results.
Overlord : manages indexing tasks.
Coordinator : balances segment distribution across Historical nodes.
Druid in iQiyi Production
Today the Druid cluster runs on hundreds of nodes, processes billions of events daily, achieves a roll‑up factor of >10×, handles ~6,000 queries per minute (P99 latency < 1 s, P90 < 200 ms), and continues to scale.
1. Coordinator Bottleneck
The original hand‑off process was single‑threaded: after a MiddleManager persisted a segment to HDFS, the Coordinator generated a plan (stored in ZooKeeper) for Historical nodes to load the segment. When many segments were triggered simultaneously, planning became O(N²) (N = number of segments, M = number of Historical nodes), causing prolonged hand‑off and slot exhaustion.
The issue was resolved by switching to the new balancer strategy:
CachingCostBalancerStrategywhich improved scheduling performance by roughly 10,000×, reducing a one‑hour coordinator block to about 30 seconds.
2. Overlord Bottleneck
Overlord API latency was high because the default DB connection pool was too small ( druid.metadata.storage.connector.dbcp.maxTotal = 64 ). Increasing the pool size eliminated the API timeout failures. However, the higher API throughput caused CPU spikes; the root cause was the frequent getRunningTasks calls from Tranquility. Adjusting druidBeam.overlordPollPeriod mitigated the issue, and ultimately moving to KIS mode reduced the load.
3. Indexing Cost Optimization
A default Druid indexing task consumes 3 CPU cores (one for ingestion, one for query, one for hand‑off). Most tasks did not need this capacity, leading to resource waste. iQiyi introduced “Tiny” nodes (1 core, 3 GB RAM) for ~80 % of small tasks while keeping the default configuration for larger tasks, effectively halving per‑task resource usage and doubling slot availability without adding hardware.
4. DataSource‑to‑Worker Mapping
Earlier Druid versions required an N × M matrix to bind each DataSource to every worker, which was cumbersome. Druid 0.17 introduced the “Category” concept, reducing configuration complexity to 2 × N + 2 × M.
5. Tranquility vs. KIS
Initially iQiyi used Tranquility (push mode) which duplicated indexing tasks for redundancy, wasting resources. Since Druid 0.14, the recommended KIS (pull) mode provides exactly‑once semantics by tracking Kafka offsets in the segment metadata. KIS handles node failures gracefully and avoids duplicate processing.
Realtime Analysis Platform (RAP)
To address Druid’s usability challenges, iQiyi built RAP, a self‑service platform that abstracts Kafka, Druid, and query details. RAP offers wizard‑driven configuration, transparent compute/storage, rich chart types, sub‑5‑minute end‑to‑end latency, second‑level query response, and instant dimension changes. Over a thousand dashboards now run on RAP across membership, recommendation, and BI services.
Future Outlook
Integrate the in‑house Pilot intelligent SQL engine for anomaly detection and rate‑limiting.
Develop an operations platform for metadata, task management, and health monitoring.
Support direct indexing of Parquet files with additional roll‑up.
Enable JOIN capabilities for richer query semantics.
References and further reading are listed at the end of the original article.
iQIYI Technical Product Team
The technical product team of iQIYI
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.