Artificial Intelligence 12 min read

iQIYI Dual‑DNN Ranking Model with Online Knowledge Distillation

This article describes iQIYI’s dual‑DNN ranking architecture that combines a high‑capacity teacher network with a lightweight student network via online knowledge distillation, addressing the trade‑off between model effectiveness and inference efficiency in large‑scale recommendation systems.

Qunar Tech Salon

Feb 27, 2020

iQIYI Dual‑DNN Ranking Model with Online Knowledge Distillation

With the rapid development of deep learning, recommendation ranking models have evolved from shallow machine‑learning approaches to deep neural networks that can automatically learn feature interactions. However, larger models often suffer from increased inference latency, creating a conflict between prediction quality and serving efficiency.

iQIYI introduced an online knowledge‑distillation framework to balance this trade‑off. The proposed dual‑DNN ranking model consists of a complex teacher DNN on the left and a simpler student DNN on the right. Both networks share the same feature embedding layer, while the teacher includes an additional feature‑interaction layer to capture richer high‑order relationships.

The training pipeline follows three key steps: (1) Feature Transfer – the student reuses the teacher’s input‑representation layer; (2) Knowledge Distillation on the Fly – the teacher’s predictions guide the student during joint training, eliminating the need for a separate distillation stage; (3) Classifier Transfer – hidden‑layer activations from the teacher supervise the student’s hidden layers, narrowing the performance gap.

After joint training, the student model is fine‑tuned daily with a 30‑day offline window and then updated online using real‑time data, ensuring the model captures the latest user behavior patterns while maintaining low latency.

Experimental results on iQIYI’s short‑video and image‑text feed show that the student model achieves comparable click‑through‑rate improvements with roughly one‑third of the parameters and five‑fold faster inference compared to the teacher, delivering higher ROI under identical resource constraints.

The paper also surveys related industry practices, such as Baidu’s CTR‑X and CTR‑3.0 models and Alibaba’s Rocket Launching method, highlighting differences in feature sharing and knowledge‑transfer strategies.

In conclusion, the dual‑DNN architecture with online knowledge distillation provides an effective solution for deploying high‑performance, low‑latency ranking models in production, and future work will explore wider and deeper teacher networks to further boost recommendation accuracy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CTR Prediction Knowledge Distillation online learning feature interaction dual DNN Ranking Models

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.