Artificial Intelligence 15 min read

Multi‑Task Learning Models for Recommendation Systems: An Industrial Survey

This article surveys recent industrial multi‑task learning approaches for recommendation, covering models such as Alibaba's ESMM and ESM2, DUPN, Meituan's deep ranking, Google’s MMoE, YouTube’s multi‑objective system, Zhihu’s ranking, and summarizing their architectures, loss functions, and practical gains.

DataFunSummit
DataFunSummit
DataFunSummit
Multi‑Task Learning Models for Recommendation Systems: An Industrial Survey

When optimizing recommendation performance, practitioners often need to improve not only click‑through rate (CTR) but also conversion rate (CVR), video watch time, dwell time, depth, follow‑rate, and like‑rate. Using a separate model for each metric is costly, so multi‑task learning (MTL) is employed to share representations and jointly optimize several objectives.

1. Alibaba ESMM (Entire Space Multi‑Task Model)

ESMM addresses sample selection bias and data sparsity in CVR estimation by introducing two auxiliary tasks that predict pCTR and pCTCVR, sharing the embedding layer across tasks. The model consists of two sub‑networks whose outputs are multiplied to obtain pCTCVR, and the loss combines log‑losses for pCTR, pCTCVR, and pCVR.

2. Alibaba DUPN (Deep User Perception Network)

DUPN learns universal user representations from multiple e‑commerce tasks. Its architecture includes a behavior‑sequence layer, shared embedding layer, LSTM layer, attention layer, and downstream multi‑task heads for CTR, learning‑to‑rank, user‑interest, and purchase propensity.

3. Meituan “Guess You Like” Deep Ranking Model

The model splits CTR and conversion objectives, sharing early DNN layers while using separate towers for each loss. Two practical tricks are introduced: a Missing‑Value Layer that learns adaptive imputations, and a KL‑divergence bound that enforces consistency between related labels (e.g., p(click)·p(conversion)=p(order)).

4. Google MMoE (Multi‑gate Mixture‑of‑Experts)

Google proposes three variants: (a) shared bottom with separate towers, (b) classic MMoE where each expert is weighted by a task‑specific gate, and (c) a per‑task gate that selects experts differently for each objective. The architecture enables learning task‑specific contributions while sharing parameters.

5. Alibaba ESM2 (Conversion Rate Prediction via Post‑Click Behaviour Modeling)

ESM2 extends ESMM by modeling deterministic actions (e.g., add‑to‑cart) and other actions before purchase. It defines four tasks (CTR, click‑to‑DAction, DAction‑to‑Buy, OAction‑to‑Buy) and combines three log‑losses weighted to form the final loss.

6. YouTube Multi‑Objective Ranking System

The system jointly predicts engagement objectives (click, watch time) and satisfaction objectives (like, rating). It uses a Multi‑gate Mixture‑of‑Experts backbone and a shallow tower to model position bias, adding the bias logits to the final sigmoid during training.

7. Zhihu Recommendation Ranking Model

Zhihu’s multi‑task model predicts eight objectives (CTR, collect, like, comment, etc.) using shared embeddings and DNN layers, with a simple weighted sum loss. Online results show stable CTR and significant lifts in likes and collections.

8. Meitu “Beauty” Recommendation Multi‑Task Model

The NFwFM model shares early hidden layers across tasks and splits at the final fully‑connected layer. It improves click‑through rate, follow‑conversion, and average view time without increasing inference cost.

9. Summary

Multi‑task learning is effective when recommendation tasks are correlated; otherwise, interference may degrade performance. Common practices include sharing bottom embeddings and MLPs, designing joint loss functions that capture task relationships (e.g., ESMM, ESM2), and employing specialized structures like MMoE to learn task‑specific weights.

Typical deployment patterns are: (1) shared bottom with task‑specific heads and weighted sum loss; (2) relationship‑aware loss engineering; (3) MMoE‑style expert‑gate architectures.

ctrDeep LearningCVRmulti-task learningRecommendation systemsMMoE
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.