Artificial Intelligence 17 min read

Online Learning for Large‑Scale DNN Ranking Models in iQIYI Feed Recommendation

iQIYI’s feed recommendation system adopts an online‑learning framework that continuously trains a massive Wide‑and‑Deep DNN on billions of streaming samples, handling dynamic user interests, OOV embeddings, delayed labels, and non‑convex optimization, enabling hourly model refreshes and delivering up to 3.8 % higher consumption versus offline baselines.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
Online Learning for Large‑Scale DNN Ranking Models in iQIYI Feed Recommendation

iQIYI’s feed recommendation system generates billions of impressions daily, creating massive challenges for model training and rapid model updates. Because user interests shift quickly, a ranking model that is not updated in time quickly degrades in performance.

Online learning is introduced as the primary solution. It captures dynamic user behavior, enables fast model adaptation, and imposes strict requirements on data pipelines, streaming sample distribution correction, training stability, and deployment performance.

The article describes iQIYI’s successful practice of an online‑learning paradigm for a W&D deep ranking model, achieving real‑time consumption of streaming data and timely updates of a DNN model with billions of samples and hundreds of billions of parameters.

Key challenges of current feed ranking models include:

Model size: DNNs with wide and deep architectures can reach trillions of parameters.

Training cost: Large datasets and many iterations require GPU‑heavy training, making frequent updates expensive.

Convergence of non‑convex models: Hyper‑parameter tuning (batch size, learning rate, optimizer) becomes critical.

For DNN online learning, additional difficulties arise:

OOV handling: Rapidly adding new IDs to the embedding dictionary while removing stale ones.

Convergence of non‑convex online gradient descent.

Delayed positive samples: In video ads, conversion labels arrive later than impressions, requiring label correction.

Industry solutions are surveyed (e.g., Alibaba XPS platform, Ant Financial). Alibaba’s C++‑based dynamic embedding lookup and XNN algorithm support streaming data training.

The model architecture used is a variant of the Wide&Deep model: both Wide and Deep parts pass through an FM layer before fusion. Deep side consumes embedding and dense features; Wide side consumes GBDT leaf nodes, sparse ID features, embeddings, and dense features. Optimizers are Adam for DNN and FTRL for Wide.

Training paradigm: an online‑learning framework combined with offline model hot‑start. Offline training on 7‑day historical data provides a checkpoint for daily hot‑start; online learning consumes Kafka streams in a one‑pass fashion, updating the model continuously.

Data pipeline challenges include multi‑source real‑time joins, sample attribution (handling delayed labels), and cache strategies for negative samples (10‑minute cache window).

Sample attribution methods compared: Facebook’s cache‑until‑positive‑sample approach vs. Twitter’s keep‑both‑samples approach; the latter was adopted.

Practical insights:

One‑pass online learning for DNN requires solving non‑convex online optimization; FTRL works well for sparse linear parts but not for DNN.

Hourly model updates outperform performance‑triggered updates.

Long periods without hot‑start lead to performance decay.

Multiple passes over streaming data can improve results, depending on data distribution.

Experimental results show significant gains: short‑video feed consumption +1.5%, image‑text feed consumption +3.8%, and higher AUC/W‑AUC compared to offline models. Real‑time feature + online learning consistently outperforms baseline and real‑time feature + offline training.

Future optimization directions include replacing GBDT with end‑to‑end deep models, exploring adaptive deep architectures and optimizers for online learning, applying frequency‑based ID filtering (e.g., Poisson or Bloom filter), and increasing model update frequency to sub‑hourly intervals.

Deep Learningonline learningreal-time trainingiQIYIDNNrecommendation system
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.