Artificial Intelligence 21 min read

Cold-Transformer: Embedding Adaptation for User Cold‑Start Recommendation

Cold‑Transformer tackles the user cold‑start problem by introducing an Embedding Adaption layer that refines sparse user embeddings using context‑aware fused behavior sequences and a label‑encoding scheme, preserving a dual‑tower design and achieving state‑of‑the‑art performance on public and industrial datasets.

Alimama Tech

Jul 20, 2022

Cold-Transformer: Embedding Adaptation for User Cold‑Start Recommendation

Abstract

Cold‑start user recommendation is a classic problem in recommender systems. Existing deep models suffer significant performance drops for cold‑start users due to (1) distributional bias between cold‑start and existing users, and (2) difficulty in representing cold‑start users with few interactions. This paper proposes Cold‑Transformer, which introduces an Embedding Adaption (EA) layer to warm‑up cold‑start user embeddings, and a Label Encoding (LE) scheme to jointly model mixed positive‑negative feedback sequences. The model retains a dual‑tower architecture for large‑scale industrial deployment. Experiments on public and industrial datasets show that Cold‑Transformer outperforms state‑of‑the‑art methods.

Background

Most recommender models rely on abundant user interactions, yet a small fraction of users (new or long‑tail) contribute very few interactions, leading to the well‑known cold‑start problem. Traditional deep recommenders are not explicitly optimized for cold‑start users, causing a performance gap.

Method

User Cold‑Start Modeling

We model the problem as a binary recommendation task (e.g., CTR prediction). Users are split into existing (seen during training) and cold‑start (unseen) groups. Cold‑start users may accumulate a limited number of behaviors before being used for training.

Embedding Adaption

We design an EA layer that adjusts user embeddings based on context‑aware fused behavior sequences. A Transformer aggregates the sequence, and the resulting context vector is used to adapt the original user embedding, mitigating feature distribution bias.

Leveraging Fused Behaviors

Cold‑start users have scarce positive feedback but relatively abundant negative feedback (exposures). We encode both types into a single ordered sequence and apply Label Encoding to reduce heterogeneity. A residual network learns a correction vector for each interaction, which is added to the original item embedding.

Learnable Global Cold Embedding

Instead of random ID initialization, we learn a global cold embedding shared across all cold‑start users, which aligns ID distributions between cold and existing users.

Experiments

Datasets

We evaluate on MovieLens‑1M, Taobao Display AD, and a large industrial exposure/click dataset (5 M records). Each dataset is split by timestamp to simulate real‑world recommendation.

Setup

Metrics: AUC (primary) and RelaImpr. Implementation uses Adam (lr=0.001), embedding size 32, two‑layer MLP (64 units), and a 2‑layer Transformer with 2 heads for EA. Dropout=0.5, max sequence length=50.

Results

Cold‑Transformer achieves the best AUC for cold‑start users across all baselines (DIN, DIEN, EdgeRec, DropoutNet, MWUF, MAML). It also improves performance for existing users, especially those with few interactions.

Ablation

We compare EA with other sequence‑modeling variants on MovieLens‑1M. Using Label‑Encoded fused behaviors consistently boosts performance; EA further enhances cold‑start results.

Conclusion

Cold‑Transformer effectively mitigates user cold‑start by adapting embeddings with fused behavior context. Extensive experiments validate its superiority. Future work will extend the approach to zero‑interaction cold‑start users.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation User Modeling embedding adaptation cold-start

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.