Artificial Intelligence 17 min read

How Tag-Based Explicit Recall Boosts Recommendation Performance with Multi-Task Learning

This article explains how a two‑stage recommendation pipeline uses explicit tag‑based recall, inverted indexes, and a multi‑task learning model to improve click‑through and dwell time by dynamically balancing loss weights across tasks.

Cyber Elephant Tech Team
Cyber Elephant Tech Team
Cyber Elephant Tech Team
How Tag-Based Explicit Recall Boosts Recommendation Performance with Multi-Task Learning

Introduction

Industrial recommendation systems typically adopt a two‑stage architecture: recall (matching) followed by ranking. The recall stage efficiently retrieves a Top‑K set of relevant items from a massive candidate pool, while the ranking stage personalizes the order, often using coarse and fine ranking stages. In feed‑based recommendation, millions of articles are filtered down to thousands or tens of thousands through recall before ranking.

The recall stage usually combines multiple strategies, including explicit tag‑based recall and implicit item‑embedding recall. Explicit recall links users and items via tags, offering high precision and real‑time performance, especially for cold‑start items. Implicit recall leverages dense vectors learned by deep networks to find similar items, providing strong generalization.

This article shares practical attempts on explicit recall; implicit recall will be covered in future work.

Tag System and Inverted Index

2.1 Tag System

At Yidian Zixun, a comprehensive tag system is built to finely label users and content, supported by an inverted index for fast tag‑to‑content lookup.

The tag definition must balance accuracy, coverage, and distinctiveness, which is crucial for fine‑grained operations.

A complete tag system includes both content tags (size categories, keywords, entities, etc.) and user tags (basic attributes, activity levels, interest tags).

Figure 1-1 Recommendation System Architecture
Figure 1-1 Recommendation System Architecture

2.2 User Interest Modeling

User tag data originates from interaction signals such as clicks, views, comments, favorites, and likes. Aggregating these behaviors over article tags constructs a user interest profile.

Interest modeling aggregates historical behavior into long‑term, short‑term, and dynamic features, with relevance decreasing over time.

Interests are expressed as (tag, weight) pairs, where weight reflects the user's affinity for the tag.

Figure 2-2 Example User Interest Profile
Figure 2-2 Example User Interest Profile

2.3 Inverted Index

To quickly retrieve the Top‑K articles related to a tag from a corpus of tens of millions, an inverted index is built using KV stores such as HBase or Redis, mapping tags to content.

New articles are indexed with their tags during ingestion, enabling real‑time exposure through explicit recall.

Figure 2-3 Article Ingestion and Indexing
Figure 2-3 Article Ingestion and Indexing

2.4 Tag‑Based Recall

By constructing a middle‑layer index for articles and matching it with user profiles, tags act as a bridge between users and content. The process first obtains the user's interest tags and their strengths, then queries related content.

Typical pipelines consist of two stages: User → Interest‑Tag and Interest‑Tag → Item, each optimized separately.

Figure 2-4 Tag Connecting Users and Articles
Figure 2-4 Tag Connecting Users and Articles

Interest Point Selection

Explicit recall for content involves tag construction, profile accumulation, chain sorting, and content retrieval. When a user enters the feed, the system predicts which interest points to surface based on the user's interest expression.

More engaged tags lead to higher click rates, per‑user clicks, and longer dwell times.

3.1 Multi‑Objective Modeling

Goal: improve per‑user clicks and dwell time.

Model: multi‑task learning (MTL) with a shared‑bottom architecture; each task uses a Wide&Deep tower.

Samples: user‑article‑tag triples; click prediction is binary, dwell time is regression.

Features: user attributes (age, gender, long‑term and short‑term interests), tag IDs, match strength between article tags and user profile, and context features (batch index, position).

Loss: cross‑entropy for CTR, MSE for dwell time; combined as a multi‑task loss.

Serving: at inference, tags from the user profile are scored, top‑N tags are selected, and a combined score (CTR × dwell) ranks candidates.

Dynamic weighting of task losses is essential because tasks have different scales and learning speeds.

3.2 Balancing Multi‑Task Loss

Recent MTL research (ESMM, MMOE, DUPN, PLE) proposes various architectures and loss‑balancing strategies. Simple sum of losses can be dominated by one task; static weighting fails to adapt to training dynamics.

We adopt uncertainty‑based weighting: each task learns a noise parameter σ², and the loss is divided by σ² with a regularization term log(σ²). Larger σ indicates higher uncertainty, reducing the task’s weight.

Figure 3-2 Multi‑Task Loss with Uncertainty
Figure 3-2 Multi‑Task Loss with Uncertainty

This approach adds only a few trainable parameters and requires no extra hyper‑parameters. Experiments show that uncertainty‑weighted MTL improves both CTR and dwell time, yielding measurable gains in per‑user click and dwell metrics.

Outlook

Through multi‑objective joint learning and balanced loss, we have achieved significant improvements in click‑through and dwell time. Future work includes exploring new model architectures, sample construction methods, and additional business metrics such as consumption depth, surprise, and satisfaction.

References

[1] Ma X, Zhao L, Huang G, et al. Entire Space Multi‑Task Model: An Effective Approach for Estimating Post‑Click Conversion Rate. SIGIR 2018.

[2] Ma J, Zhe Z, Yi X, et al. Modeling Task Relationships in Multi‑task Learning with Multi‑gate Mixture‑of‑Experts. 2018.

[3] Ni Y, Ou D, Liu S, et al. Perceive Your Users in Depth: Learning Universal User Representations from Multiple E‑commerce Tasks. KDD 2018.

[4] Tang H, Liu J, Zhao M, et al. Progressive Layered Extraction (PLE): A Novel Multi‑Task Learning Model for Personalized Recommendations. RecSys 2020.

[5] Chen Z, Badrinarayanan V, Lee CY, et al. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. 2017.

[6] Liu S, Johns E, Davison AJ. End‑to‑End Multi‑Task Learning with Attention. 2018.

[7] Guo M, Haque A, Huang DA, et al. Dynamic Task Prioritization for Multitask Learning. Springer 2018.

[8] Sener O, Koltun V. Multi‑Task Learning as Multi‑Objective Optimization. 2018.

[9] Kendall A, Gal Y, Cipolla R. Multi‑Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. 2018.

artificial intelligencerecommendation systemmulti-task learninguser interest modelingtag indexingexplicit recall
Cyber Elephant Tech Team
Written by

Cyber Elephant Tech Team

Official tech account of Cyber Elephant, a platform for the group's technology innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.