Artificial Intelligence 15 min read

How Multi‑Task Multi‑Scene Modeling Powers ZhiZhuan’s Search: Algorithms, Industry Practices, and Lessons

This article analyzes the challenges of multi‑task and multi‑scene recommendation for large‑scale C‑end services, reviews key academic and industry solutions such as Shared‑Bottom, MMoE, PLE, ESMM, LHUC, PEPNet, MTMS and HiNet, and details ZhiZhuan’s end‑to‑end architecture that achieved over 6% click‑through and 2% conversion improvements.

Architect

Dec 14, 2023

How Multi‑Task Multi‑Scene Modeling Powers ZhiZhuan’s Search: Algorithms, Industry Practices, and Lessons

1. Overview of Multi‑Task & Multi‑Scene Challenges

Large‑scale consumer‑facing applications often need to optimize several user‑experience metrics (e.g., CTR, CVR, collection rate) across many usage scenarios (feed, search, etc.). Training separate models per scenario is costly and hampers iteration, while a unified model can suffer from data‑distribution mismatches, leading to a "seesaw" effect where dominant scenarios degrade others. Similarly, tasks with different sample sparsity (CTR vs. CVR) create training‑inference gaps.

1.1 Background

Multi‑task learning aims to jointly learn related objectives, whereas multi‑scene modeling seeks a shared representation that can adapt to diverse user behaviors and material supplies across scenarios.

1.2 Multi‑Task Solutions

The evolution starts with Shared‑Bottom (shared bottom network, task‑specific heads) which benefits correlated tasks but can hurt unrelated ones. MoE introduces a set of expert networks gated by a learned router, mitigating negative transfer. MMoE extends MoE by providing task‑specific gating, allowing each task to weight experts differently. PLE further combines shared and task‑specific experts in a progressive layered architecture, achieving strong empirical results.

Alibaba’s ESMM addresses conditional relationships between tasks (e.g., click → conversion) by modeling the entire space and aligning training and inference sample distributions, reporting notable accuracy gains in e‑commerce settings.

1.3 Multi‑Scene Solutions

LHUC (Learning Hidden Unit Contributions), originally for speaker adaptation, is repurposed to adjust dense parameters per scene, preventing representation collapse when feature engineering is insufficient.

Dynamic‑weight gating further yields algorithms such as PEPNet (Kuaishou), M2M , AdaSparse , and STAR (Alibaba), all of which rely on gating networks to filter or recombine expert outputs for each scene or task.

In summary, most multi‑task‑multi‑scene models can be viewed as variations of gating‑based information selection and re‑composition.

2. Industry Solutions Overview

PEPNet (Parameter and Embedding Personalized Network) uses a GateNU gating module to personalize both embedding and parameter networks (EPNet and PPNet), aligning scene information with task‑specific embeddings.

MTMS (Multi‑Task and Multi‑Scene) – Baidu adopts a multi‑tower design: independent embeddings per scene/task and a two‑stage training pipeline (representation learning → fine‑tune). Unlike ESMM’s end‑to‑end training, MTMS first learns separate embeddings, then concatenates them and trains only the top MLP.

HiNet (Hierarchical Information Extraction Network) – Meituan builds on MMoE with a hierarchical scene‑extraction module (shared experts, scene‑specific experts, scene‑sensitive attention) and a task‑extraction module that re‑uses MMoE’s gating to produce task‑specific embeddings.

3. ZhiZhuan’s Multi‑Business Multi‑Scene Solution

3.1 Problem & Solution

ZhiZhuan expanded from mobile 3C products to a broad catalog (electronics, appliances, etc.), introducing multiple business lines and scenarios (search, recommendation, group‑buy, etc.). Directly applying MTMS‑style independent embeddings would suffer from data imbalance in small scenes, and a unified pretrained embedding would miss business‑specific material features.

The adopted architecture combines EPNET with feature‑level dynamic weighting. The model consists of:

Scene representation derived from the category set of the item.

SparseFeatures and DenseFeatures that encode user, query, and material (including business‑specific attributes). DomainNet that processes all features, outputs weights applied to non‑scene features, and aggregates them into a global vector.

A prediction head that re‑uses DCN (Deep & Cross Network) for CTR (or other task) prediction.

The model is trained end‑to‑end (unlike MTMS’s two‑stage approach), with the representation module handling multi‑business, multi‑scene, material, user, and query signals, and the prediction module delivering task outputs.

Online results show a +6% lift in overall click‑through rate and a +2% increase in purchase conversion , especially pronounced in low‑traffic categories where gains exceed the average.

3.2 Future Plans

While the solution proves effective for CTR, extending it to CVR and other recommendation tasks is planned. A current limitation is cold‑start handling for new scenes or material attributes, which may hinder full‑site rollout; future work will focus on alleviating this bottleneck.

References

[1] MMoE: Modeling Task Relationships in Multi‑task Learning with Multi‑gate Mixture‑of‑Experts.

[2] PLE: Progressive Layered Extraction (PLE): A Novel Multi‑task Learning Model for Personalized Recommendations.

[3] MoE: Adaptive Mixtures of Local Experts.

[4] ESMM: Entire Space Multi‑Task Model: An Effective Approach for Estimating Post‑Click Conversion Rate.

[5] LHUC: Learning Hidden Unit Contribution for Unsupervised Speaker Adaptation of Neural Network Acoustic Models.

[6] PEPNet: Parameter and Embedding Personalized Network for Infusing with Personalized Prior Information.

[7] M2M: A Multi‑Scenario Multi‑Task Meta‑Learning Approach for Advertiser Modeling.

[8] AdaSparse: Learning Adaptively Sparse Structures for Multi‑Domain Click‑Through Rate Prediction.

[9] STAR: One Model to Serve All: Star Topology Adaptive Recommender for Multi‑Domain CTR Prediction.

[10] MTMS: Multi‑Task and Multi‑Scene Unified Ranking Model for Online Advertising.

[11] HiNet: Novel Multi‑Scenario & Multi‑Task Learning with Hierarchical Information Extraction.

[12] DCN: Deep & Cross Network for Ad Click Predictions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Multi-Task Learning Recommendation Systems AI model architecture Industry Case Study gating networks multi-scene recommendation ZhiZhuan

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.