iQIYI Dual‑DNN Ranking Model with Online Knowledge Distillation
This article describes iQIYI’s dual‑DNN ranking architecture that combines a high‑capacity teacher network with a lightweight student network via online knowledge distillation, addressing the trade‑off between model effectiveness and inference efficiency in large‑scale recommendation systems.
With the rapid development of deep learning, recommendation ranking models have evolved from shallow machine‑learning approaches to deep neural networks that can automatically learn feature interactions. However, larger models often suffer from increased inference latency, creating a conflict between prediction quality and serving efficiency.
iQIYI introduced an online knowledge‑distillation framework to balance this trade‑off. The proposed dual‑DNN ranking model consists of a complex teacher DNN on the left and a simpler student DNN on the right. Both networks share the same feature embedding layer, while the teacher includes an additional feature‑interaction layer to capture richer high‑order relationships.
The training pipeline follows three key steps: (1) Feature Transfer – the student reuses the teacher’s input‑representation layer; (2) Knowledge Distillation on the Fly – the teacher’s predictions guide the student during joint training, eliminating the need for a separate distillation stage; (3) Classifier Transfer – hidden‑layer activations from the teacher supervise the student’s hidden layers, narrowing the performance gap.
After joint training, the student model is fine‑tuned daily with a 30‑day offline window and then updated online using real‑time data, ensuring the model captures the latest user behavior patterns while maintaining low latency.
Experimental results on iQIYI’s short‑video and image‑text feed show that the student model achieves comparable click‑through‑rate improvements with roughly one‑third of the parameters and five‑fold faster inference compared to the teacher, delivering higher ROI under identical resource constraints.
The paper also surveys related industry practices, such as Baidu’s CTR‑X and CTR‑3.0 models and Alibaba’s Rocket Launching method, highlighting differences in feature sharing and knowledge‑transfer strategies.
In conclusion, the dual‑DNN architecture with online knowledge distillation provides an effective solution for deploying high‑performance, low‑latency ranking models in production, and future work will explore wider and deeper teacher networks to further boost recommendation accuracy.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.