How Model-Agnostic Interest Learning (MAIL) Solves Cold‑Start in Recommender Systems

This paper introduces MAIL, a model‑agnostic dual‑tower framework that uses a zero‑shot learning tower to generate virtual user behaviors for new users and an embedding‑based ranking tower, achieving 13‑15% CTR lift in large‑scale live‑stream recommendation at NetEase Cloud Music.

NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
How Model-Agnostic Interest Learning (MAIL) Solves Cold‑Start in Recommender Systems

Abstract

Recommender systems rely on abundant user‑item interaction data, but new users suffer from severe data sparsity (cold‑start). The authors propose MAIL, a model‑agnostic dual‑tower architecture consisting of a zero‑shot tower that reconstructs virtual behaviors for new users via cross‑modal auto‑encoders, and a ranking tower that can be any embedding‑based deep model. Deployed in NetEase Cloud Music live‑stream recommendation, MAIL improves click‑through rate (CTR) by 13%–15% and shows consistent gains on public benchmarks.

Background

Cold‑start remains a fundamental challenge because existing recommenders over‑depend on historical interaction logs. Prior solutions exploit user attributes, cross‑domain signals, or meta‑learning, yet they still cannot fully compensate for missing behavior data. Zero‑shot learning (ZSL) in computer vision addresses a similar problem—recognizing unseen classes without training samples—motivating the authors to treat new‑user recommendation as a ZSL task.

Method

Problem Definition

Each user is described by four feature groups: attributes (e.g., age, gender), behavior sequences, context, and target item. For new users, behavior features are absent, so the goal is to infer a latent behavior vector from attributes.

Zero‑Shot Tower

The tower uses two auto‑encoders (one for attributes, one for behaviors) that share a common hidden space. Cross‑modal reconstruction aligns attribute and behavior embeddings, while a Maximum Mean Discrepancy (MMD) loss forces the hidden distributions of old‑user and new‑user representations to match. The overall loss combines attribute reconstruction, behavior reconstruction, and MMD terms.

Ranking Tower

The ranking tower is model‑agnostic; any embedding‑based network (e.g., DeepFM, DMR) can be plugged in. It receives the virtual behavior vector from the zero‑shot tower, concatenates it with other features, and predicts CTR/CVR using a multilayer perceptron with cross‑entropy loss.

Training Tricks

Two separate optimizers: one updates the ranking tower and shared embeddings, the other updates only the zero‑shot tower without touching the shared embedding matrix.

Residual‑style auto‑encoders improve reconstruction quality for sparse interaction data.

Experiments

Datasets

Public: Alibaba advertising logs (8 days, ~1.14 M users, ~0.84 M items). Industrial: NetEase Cloud Music live‑stream logs (8 days, ~613 k users, ~52 k items). New users are simulated by masking behavior for a random 40% of users in the public set and naturally exist in the industrial set.

Baselines

EmbLR (embedding‑based logistic regression)

LLAE (linear low‑rank auto‑encoder)

BaseDNN (standard deep ranking tower)

MetaEmb (meta‑learning initialization)

DMR (deep match‑to‑rank model)

Results

On public data, MAIL‑Base improves new‑user AUC from 0.5862 (BaseDNN) to 0.5958 (≈1.6% relative gain). On the industrial dataset, new‑user AUC rises from 0.6849 to 0.6895. Ablation studies show that removing either cross‑modal reconstruction or MMD degrades performance, confirming the importance of all three loss components.

Online A/B Test

Deploying MAIL‑DMR against the baseline DMR for 3 M users over two weeks yields a 13%–15% CTR increase and a 3%–4% CTCVR lift, demonstrating real‑world impact.

Conclusion

MAIL provides a flexible, model‑agnostic solution to the cold‑start problem by treating new‑user recommendation as a zero‑shot learning task. The dual‑tower design enables seamless integration with existing ranking models, delivering consistent offline and online performance gains in large‑scale production environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

dual-towerEmbeddingrecommender systemscold startindustry insightszero-shot learning
NetEase Cloud Music Tech Team
Written by

NetEase Cloud Music Tech Team

Official account of NetEase Cloud Music Tech Team

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.