Artificial Intelligence 15 min read

Optimization of Coarse Ranking Models for Short‑Video Recommendation at iQIYI

iQIYI’s short‑video recommendation team replaced a GBDT coarse‑ranking model with a lightweight dual‑tower DNN, applied knowledge distillation, sparse‑aware embedding optimization, and inference merging, then introduced a cascade MMOE architecture, achieving comparable accuracy with half the memory, ~19 ms latency reduction, and measurable gains in watch time, CTR and engagement.

iQIYI Technical Product Team

Feb 26, 2021

Optimization of Coarse Ranking Models for Short‑Video Recommendation at iQIYI

Industrial recommendation systems usually contain four stages—recall, coarse ranking, fine ranking, and re‑ranking—each acting like a funnel that filters a massive item pool to the items most likely to interest the user. The coarse‑ranking stage is responsible for unifying the computation and filtering of recall results, reducing the computational load of the fine‑ranking stage while preserving recommendation accuracy.

The iQIYI SuiKe basic recommendation team introduced a series of practical improvements to the coarse‑ranking model used in short‑video recommendation.

Figure 1: Overall recommendation workflow diagram.

Background

When selecting coarse‑ranking models, industry practitioners prioritize performance. Historically, coarse‑ranking models can be grouped into three major categories:

Simple cutoff based on recall scores or global CTR statistics.

Machine‑learning models such as LR or decision trees that provide modest personalization.

The most widely used today: dual‑tower DNN models that compute user and item embeddings via deep networks and rank by vector similarity.

iQIYI’s short‑video recommendation initially used a GBDT model (the second category) built on a variety of statistical features:

User‑group consumption statistics for different video types.

Video‑level statistics such as CTR, median/mean watch duration, creator statistics, and tag statistics.

User historical consumption statistics (e.g., type‑tag distribution, creator consumption).

After upgrading the fine‑ranking model to a wide&deep architecture, a large gap was observed between the top‑ranked videos predicted by the coarse‑ranking GBDT model and those predicted by the fine‑ranking model. The gap stemmed from two sources:

Feature set differences: the GBDT model relied on dense statistical features, while the wide&deep model leveraged sparse features such as video ID, tag, creator ID, and short‑term user behavior.

Model‑structure differences: tree‑based models and DNNs focus on different aspects during optimization.

To narrow this gap and reduce feature‑engineering effort, the team performed a series of upgrades.

Dual‑Tower DNN Coarse‑Ranking Model

Considering both computational efficiency and experimental results, the team selected the mainstream dual‑tower DNN model. The architecture (Figure 2) consists of three fully‑connected layers on both the user side and the item side, producing 512‑dimensional embeddings for scoring.

Figure 2: Dual‑tower DNN coarse‑ranking model structure.

Feature selection for the coarse model was heavily trimmed to keep parameter count low. The user side kept only a few features from the fine‑ranking model:

Basic user profile and context features (e.g., device OS, model, region).

Historical behavior features such as watched video IDs, creator IDs, and tag keywords, plus session‑level behavior.

The item side retained just three sparse features:

Video ID

Creator (up‑owner) ID

Video tag

Figure 3: Coarse‑ranking dual‑tower DNN model diagram.

Optimization Measures

To maintain accuracy while meeting online latency constraints, the team applied three major optimizations:

Knowledge Distillation : Using a teacher‑student framework, the fine‑ranking model served as the teacher to guide the coarse‑ranking model. The loss combined student loss, teacher loss, and a distillation loss (MSE between logits) weighted by a hyper‑parameter λ that increased over training steps (Figure 4).

Embedding Parameter Optimization : Replaced the optimizer for embedding parameters with the sparse‑aware FTRL optimizer (while keeping AdaGrad for other layers). This reduced the proportion of zero embeddings to 49.7%, allowed pruning of zero embeddings, shrinking model size by 46.8% and halving memory consumption during online loading.

Online Inference Optimizations : Merged user‑side embedding calculations for the same user across thousands of candidate items, cutting inference passes and reducing p99 latency by ~19 ms. Cached static video embeddings for high‑frequency videos, further accelerating scoring (Figure 5).

Figure 4: Distillation loss λ schedule.

Figure 5: Optimized coarse‑ranking inference service architecture.

These optimizations allowed the dual‑tower DNN coarse model to achieve performance comparable to the previous GBDT model while being deployed in iQIYI’s Hotspot channel and SuiKe homepage feed. Online metrics improved: average user watch time increased by ~3 pp for the feed and ~1 pp for the hotspot channel; CTR and average video count rose by ~2 %.

Cascade Model

To align coarse‑ranking objectives with the evolving fine‑ranking goals, the team upgraded the coarse model to a cascade architecture using a Multi‑Gate Mixture‑of‑Experts (MMOE) multi‑task model, the same architecture as the fine‑ranking model. Instead of learning from raw exposure/click data, the coarse model now directly learns the fine‑ranking model’s predictions as soft labels (Figure 6). This eliminated the distillation step, reducing training resources and time, and yielded a ~3 % lift in exposure‑click rate and a 12 % increase in average comments.

Figure 6: Cascade model training data flow.

Future Plans

Explore the next‑generation coarse‑ranking system “COLD”.

Further improve online performance to allow a larger recall set and incorporate more fine‑ranking‑validated features, boosting accuracy.

Enhance user‑item embedding similarity computation by adding a shallow network to replace the current cosine similarity.

References

https://www.kdd.org/kdd2018/accepted-papers/view/modeling-task-relationships-in-multi-task-learning-with-multi-gate-mixture-

H.B. McMahan, “Follow‑the‑regularized‑leader and mirror descent: Equivalence theorems and L1 regularization,” AISTATS 2011.

https://arxiv.org/abs/1503.02531

https://arxiv.org/abs/2007.16122

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation system coarse ranking Knowledge Distillation online inference cascade model dual-tower DNN embedding optimization

Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.