Multi-Task Learning Models for Recommendation Systems: A Survey of Industrial Applications

Recent industrial advances in multi-task learning for recommendation systems, including Alibaba's ESMM and DUPN, Meituan's deep ranking, Google's MMoE, YouTube's multi-objective ranking, and Zhihu's multi-goal model, demonstrate how shared embeddings and specialized loss functions improve CTR, CVR, and user engagement metrics.

DataFunTalk
DataFunTalk
DataFunTalk
Multi-Task Learning Models for Recommendation Systems: A Survey of Industrial Applications

Optimizing recommendation performance often requires improving multiple metrics such as click‑through rate (CTR), conversion rate (CVR), video watch time, user dwell time, and engagement signals; using a single model for several related tasks—known as multi‑task learning—can address this efficiently.

1. Alibaba ESMM : The Entire Space Multi‑Task Model tackles sample selection bias and data sparsity in CVR prediction by jointly learning pCTR and pCTCVR with two sub‑networks that share embedding layers. The model outputs pCTR, pCVR (derived from pCTR and pCTCVR), and pCTCVR, and its loss combines three log‑loss terms, allowing the CVR sub‑task to benefit from abundant non‑click impressions.

2. Alibaba DUPN : DUPN learns universal user representations from multiple e‑commerce tasks. Its architecture consists of a behavior‑sequence layer, embedding layer, LSTM layer, attention layer, and downstream multi‑task heads (CTR, L2R, user‑interest, purchase propensity). Techniques such as daily incremental model updates and model splitting (computing user representation once and reusing it) improve training efficiency and handle large item sets.

3. Meituan "Guess You Like" Deep Ranking : The system separates CTR and conversion objectives into distinct loss functions while sharing the lower layers of a DNN. The shared bottom learns common features, and the top layers specialize for each task, yielding better conversion while maintaining CTR.

4. Google MMoE : Modeling Task Relationships in Multi‑task Learning with Multi‑gate Mixture‑of‑Experts introduces three architectures: (a) shared bottom with separate towers, (b) multiple experts with a gating network that weights expert outputs per task, and (c) task‑specific gates (MMoE) that learn distinct expert combinations for each task.

5. Alibaba ESM2 : Extending ESMM, ESM2 models post‑click behavior by distinguishing deterministic actions (e.g., add‑to‑cart) from other actions. It predicts four tasks: Y1 (CTR), Y2 (click→DAction probability), Y3 (DAction→purchase probability), and Y4 (OtherAction→purchase probability). The overall loss is a weighted sum of three log‑loss components for pCTR, pCTAVR, and pCTCVR.

6. YouTube Multi‑Objective Ranking System : The model jointly optimizes engagement objectives (click, watch time) and satisfaction objectives (like, rating) using a Multi‑gate Mixture‑of‑Experts backbone and a shallow tower that predicts position bias. The shallow tower’s output is added to the logits of bias‑sensitive tasks during training but omitted at inference.

7. Zhihu Ranking Model : Zhihu’s recommendation page uses an 8‑task multi‑objective model (CTR, collection rate, like rate, comment rate, etc.) with shared embeddings and early DNN layers, and a simple linear weighted sum of task losses. After deployment, CTR remained stable while other metrics such as likes and collections increased significantly.

8. Meitu "Multi‑task NFwFM" : This model shares the first few hidden layers across tasks and splits into separate fully‑connected heads for each objective. It adds a shallow tower for bias correction and achieves notable lifts in click‑through rate, follow‑conversion, and average view time on the Meitu community feed.

9. Summary : Multi‑task learning in recommendation is typically applied via (1) shared embedding and MLP layers with task‑specific heads, (2) loss functions that encode task relationships (e.g., ESMM, ESM2), or (3) specialized structures like MMoE that learn task‑specific expert weights. Effectiveness depends on the strength of task correlations.

References: https://arxiv.org/pdf/1804.07931.pdf , https://www.jianshu.com/p/35f00299c059 , https://arxiv.org/pdf/1805.10727.pdf , https://tech.meituan.com/2018/03/29/recommend-dnn.html , https://zhuanlan.zhihu.com/p/70940522 , https://arxiv.org/abs/1910.07099 and others.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AICTRDeep LearningCVRmulti-task learningRecommendation Systems
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.