Artificial Intelligence 13 min read

Multi-Task Learning for E-commerce Search: Overview, Practices, and Model Design in the Zhuanzhuan Scenario

This article reviews the necessity, benefits, and practical implementations of multi-task learning in e‑commerce search, detailing model selection, architecture extensions such as ESMM and ESM², and future directions for handling user behavior sequences and multi‑objective optimization.

Zhuanzhuan Tech
Zhuanzhuan Tech
Zhuanzhuan Tech
Multi-Task Learning for E-commerce Search: Overview, Practices, and Model Design in the Zhuanzhuan Scenario

1 Introduction

In the Zhuanzhuan search system, increasing modeling goals require models that can handle multiple tasks simultaneously, making multi‑task learning a mainstream direction for ranking models. User actions on the detail page—such as favorite, add‑to‑cart, and contact customer service—provide crucial signals for predicting the final purchase decision, necessitating a comprehensive modeling of the decision chain.

2 Overview of Multi‑Task Learning

2.1 Necessity of Multi‑Task Learning

Designing separate models for each behavior increases engineering complexity, leads to information isolation, and wastes computational resources. Independent training also raises the risk of over‑fitting, especially for tasks with scarce samples, and makes it difficult to balance conflicting objectives such as click‑through rate versus conversion rate.

Multi‑task learning can mitigate these issues by sharing knowledge across tasks, reducing over‑fitting, improving generalization, and increasing training efficiency.

2.2 Benefits of Multi‑Task Learning

Multi‑task learning relates to other "multi" concepts such as multi‑label and multi‑class. It enables knowledge sharing, reduces over‑fitting, improves generalization, and allows auxiliary tasks to provide additional supervision for the main task.

multi‑task: tasks can be classification or regression, sharing some features and samples.

multi‑label: multiple labels on the same sample and features.

multi‑class: classification with multiple possible outcomes.

2.3 Practical Aspects of Multi‑Task Learning

Effective network structure design and loss‑balancing strategies are key research directions. Network design focuses on which parameters to share and where, with examples such as ESMM (explicit task relationship) and MMOE (implicit relationship). Loss‑balancing methods like Uncertainty Weighting (UWL) and GradNorm adjust task weights based on uncertainty or gradient norms.

3 Multi‑Task Learning in the Zhuanzhuan Scenario

3.1 Model Selection

The early ranking CVR model jointly modeled "order" and "payment" using an ESMM‑style architecture to address sample selection bias and data sparsity. Sample selection bias arises because conversion only occurs after a click, causing a distribution mismatch between training (click data) and inference (all impressions). Data sparsity refers to the far fewer click samples compared to exposure samples used for CTR estimation.

By jointly learning CTR and CTCVR, the model indirectly optimizes CVR, alleviating both issues and improving performance.

The implementation uses two towers: one for "click‑to‑order" and another for "order‑to‑payment", sharing bottom‑layer embeddings while having task‑specific upper layers. The final CVR score is the product of the two tower outputs.

Further experiments replacing tower backbones (W&D, DeepFM, DCN), adjusting tower loss weights, and comparing joint versus separate training yielded limited gains; the dominant factors remained sample quality and feature engineering. Future work includes adding the "favorite" action as a modeling target, expanding the decision path to "click‑>order‑>payment" with an additional "favorite" node.

Adapting the ESMM framework, the model was extended to four towers to jointly model "favorite", "order", and "payment" tasks (ESM²). The towers share implicit representations without explicit labels, producing probability outputs for each stage.

3.2 Future Plans

Future work will introduce attention mechanisms to model user behavior sequences, incorporate additional app‑level behavior data, and explore better fusion methods for multi‑intent search across different categories.

References

[1] Multi‑objective optimization overview and ESMM vs. MMOE comparison: https://www.cnblogs.com/whu-zeng/p/14111888.html

[2] FunRec – Multi‑Task Learning Overview: https://datawhalechina.github.io/fun-rec/

[3] Multi‑objective sample weighting: GradNorm and DWA: https://zhuanlan.zhihu.com/p/542296680

[4] Entire Space Multi‑Task Model: An Effective Approach for Estimating Post‑Click Conversion Rate: https://arxiv.org/abs/1804.07931

[5] Entire Space Multi‑Task Modeling via Post‑Click Behavior Decomposition for Conversion Rate Prediction: https://arxiv.org/abs/1910.07099

[6] A Pareto‑Efficient Algorithm for Multiple Objective Optimization in E‑Commerce Recommendation: http://ofey.me/papers/Pareto.pdf

[7] Modeling Task Relationships in Multi‑task Learning with Multi‑gate Mixture‑of‑Experts: https://dl.acm.org/doi/pdf/10.1145/3219819.3220007

e-commercedeep learningmulti-task learningRecommendation systemsconversion rate predictionModel ArchitectureESMM
Zhuanzhuan Tech
Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.