Cold-Transformer: Embedding Adaptation for User Cold‑Start Recommendation
Cold‑Transformer tackles the user cold‑start problem by introducing an Embedding Adaption layer that refines sparse user embeddings using context‑aware fused behavior sequences and a label‑encoding scheme, preserving a dual‑tower design and achieving state‑of‑the‑art performance on public and industrial datasets.
Abstract
Cold‑start user recommendation is a classic problem in recommender systems. Existing deep models suffer significant performance drops for cold‑start users due to (1) distributional bias between cold‑start and existing users, and (2) difficulty in representing cold‑start users with few interactions. This paper proposes Cold‑Transformer, which introduces an Embedding Adaption (EA) layer to warm‑up cold‑start user embeddings, and a Label Encoding (LE) scheme to jointly model mixed positive‑negative feedback sequences. The model retains a dual‑tower architecture for large‑scale industrial deployment. Experiments on public and industrial datasets show that Cold‑Transformer outperforms state‑of‑the‑art methods.
Background
Most recommender models rely on abundant user interactions, yet a small fraction of users (new or long‑tail) contribute very few interactions, leading to the well‑known cold‑start problem. Traditional deep recommenders are not explicitly optimized for cold‑start users, causing a performance gap.
Method
User Cold‑Start Modeling
We model the problem as a binary recommendation task (e.g., CTR prediction). Users are split into existing (seen during training) and cold‑start (unseen) groups. Cold‑start users may accumulate a limited number of behaviors before being used for training.
Embedding Adaption
We design an EA layer that adjusts user embeddings based on context‑aware fused behavior sequences. A Transformer aggregates the sequence, and the resulting context vector is used to adapt the original user embedding, mitigating feature distribution bias.
Leveraging Fused Behaviors
Cold‑start users have scarce positive feedback but relatively abundant negative feedback (exposures). We encode both types into a single ordered sequence and apply Label Encoding to reduce heterogeneity. A residual network learns a correction vector for each interaction, which is added to the original item embedding.
Learnable Global Cold Embedding
Instead of random ID initialization, we learn a global cold embedding shared across all cold‑start users, which aligns ID distributions between cold and existing users.
Experiments
Datasets
We evaluate on MovieLens‑1M, Taobao Display AD, and a large industrial exposure/click dataset (5 M records). Each dataset is split by timestamp to simulate real‑world recommendation.
Setup
Metrics: AUC (primary) and RelaImpr. Implementation uses Adam (lr=0.001), embedding size 32, two‑layer MLP (64 units), and a 2‑layer Transformer with 2 heads for EA. Dropout=0.5, max sequence length=50.
Results
Cold‑Transformer achieves the best AUC for cold‑start users across all baselines (DIN, DIEN, EdgeRec, DropoutNet, MWUF, MAML). It also improves performance for existing users, especially those with few interactions.
Ablation
We compare EA with other sequence‑modeling variants on MovieLens‑1M. Using Label‑Encoded fused behaviors consistently boosts performance; EA further enhances cold‑start results.
Conclusion
Cold‑Transformer effectively mitigates user cold‑start by adapting embeddings with fused behavior context. Extensive experiments validate its superiority. Future work will extend the approach to zero‑interaction cold‑start users.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.