Big Data 10 min read

Exploring iQIYI’s Unified Big Data + AI Architecture: Challenges, Solutions, and Future Directions

iQIYI’s unified big‑data + AI platform combines a hybrid‑cloud model, storage‑compute separation via its QBFS virtual file system, a reusable feature‑store and operator DAGs, and multi‑tenant YARN scheduling to overcome legacy Hive/Spark bottlenecks, accelerate large‑scale model training, improve data quality, and prepare for future real‑time, privacy‑preserving AI workloads.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
Exploring iQIYI’s Unified Big Data + AI Architecture: Challenges, Solutions, and Future Directions

Big data is the foundation of artificial intelligence. iQIYI has extended its data accumulation platform into a unified big data + AI architecture that integrates data, algorithm training, and business scenarios.

The company adopts a hybrid‑cloud deployment model, with on‑premises private cloud as the core and selective use of public‑cloud services for exploratory workloads. This approach balances cost, scalability, and operational stability.

Running machine‑learning tasks on the legacy big‑data platform presents several challenges. Feature processing relies on Hive and Spark, but engineering efficiency and performance at massive data scales are problematic. Model training faces issues such as limited framework support on Hadoop (e.g., TensorFlow, PyTorch), complex resource scheduling for CPU, memory, and GPU, and insufficient Docker support in older Hadoop versions.

The core integration problem is the unification of data and compute. Traditional ML isolates data on a single machine, while a true AI‑enabled big‑data platform must ingest AI‑related data, leverage distributed storage and compute, and streamline feature production, sample generation, model training, and model management.

Metadata unification further eliminates siloed development, saving development effort and improving data and model quality.

iQIYI achieves storage‑compute separation through its self‑developed QBFS (iQIYI Big Data File System), a virtual file system that maps virtual paths to various back‑ends (HDFS, public‑cloud object storage, private‑cloud object storage), allowing compute jobs to access data regardless of the underlying storage.

To mitigate the performance impact of storage‑compute separation, iQIYI employs advanced compression and erasure coding, columnar storage formats, distributed caching (e.g., Alluxio), and data‑compute pipelining such as TensorFlow’s Dataset API to overlap I/O with computation.

The platform supports multi‑tenant usage. Tenant isolation is realized via Hadoop proxy‑user mechanisms for task submission and YARN scheduler queues (configured with min/max resources, weights, etc.) for fine‑grained resource isolation.

The most common bottlenecks are heavy feature‑processing workloads and large‑scale distributed training that consume extensive CPU, memory, and network resources. Solutions include a unified feature store with configuration‑driven feature computation, feature reuse to avoid duplicate work, and proactive resource monitoring with balanced placement of heterogeneous tasks.

iQIYI’s customizations focus on feature operators. Over ten common feature‑calculation patterns were abstracted into reusable operators that form a DAG, replacing long, hard‑to‑read SQL scripts and improving readability and optimization.

Business impact after deploying the platform includes significantly faster and more scalable distributed model training, enabling the use of larger datasets and earlier model delivery, as well as standardized feature management that reduces siloed development and enhances data quality.

Looking ahead, AI workloads will occupy a larger share of the big‑data platform, with real‑time online learning becoming more prevalent. Privacy‑preserving technologies such as federated learning and secure multi‑party computation are expected to gain broader adoption due to regulatory requirements.

machine learningAIplatform architectureDistributed ComputingHybrid Cloud
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.