Artificial Intelligence 15 min read

Multimodal Cold‑Start Techniques for Music Recommendation at NetEase Cloud Music

This article presents NetEase Cloud Music's multimodal cold‑start solution, detailing the problem background, feature selection using CLIP, two modeling approaches (I2I2U indirect and U2I DSSM direct), contrastive learning enhancements, interest‑boundary modeling, and evaluation results showing significant gains in user engagement.

DataFunSummit
DataFunSummit
DataFunSummit
Multimodal Cold‑Start Techniques for Music Recommendation at NetEase Cloud Music

Cold‑start modeling is crucial for recommendation systems on content platforms, especially for music where new items are scarce but have long lifecycles. NetEase Cloud Music faces the challenge of recommending cold, long‑tail songs without user interaction data.

The solution consists of four parts: problem background, technical scheme (feature selection and model building), summary, and Q&A.

Problem background: importance of cold‑start for user experience and platform content richness.

Technical scheme: multimodal feature extraction using CLIP (audio via Transformer, text via BERT) and two modeling approaches.

Summary of results and future directions.

Q&A session.

Technical scheme

The core problem is to find potential target users for cold‑start items. Two modeling pipelines are proposed:

I2I modeling : self‑supervised contrastive learning to enhance cold‑start algorithms.

U2I modeling : multimodal DSSM user‑interest boundary modeling.

Multimodal features (audio, text, tags) are encoded with a CLIP‑based framework, producing a unified song representation. A two‑stage training strategy is used: pre‑training the multimodal encoder, then feeding the features into downstream recall or ranking models.

For indirect modeling (I2I2U), the cold song is linked to similar existing songs (I2I) and then to users who have interacted with those similar songs (I2U), enabling distribution without direct user interaction.

Supervised learning aligns song vectors with collaborative‑filtering similarity using BPR loss. Contrastive learning (infoNCE) is added to mitigate popularity bias, with random augmentations and a correlation‑based grouping mechanism to generate positive and negative pairs.

Online deployment builds a vector index for all songs; a new cold song is encoded, its nearest neighbors are retrieved, and users who interacted with those neighbors receive the recommendation.

The U2I DSSM model consists of an ItemTower (using the multimodal encoder) and a UserTower, plus an additional interest‑boundary tower that separates positive and negative samples. During inference, the item score and the user‑boundary score are compared to decide whether to recommend a cold item to a user.

Evaluation shows substantial improvements: offline metrics and online gains such as +38% more potential target users, +1.95% higher collection rate, and +1.42% higher completion rate for cold items.

Future work includes multimodal fusion of content and behavior features and end‑to‑end optimization of recall and ranking.

Q&A

Q1: Core metrics for music cold‑start are collection rate and completion rate. Q2: Multimodal features are obtained via pre‑training with CLIP, not end‑to‑end training. Q3: Only one encoder is used; contrastive learning corrects bias toward popular items. Q4: The interest‑boundary tower models a user‑specific threshold to separate positive and negative samples. Q5: User tower and interest‑boundary tower have the same inputs but separate parameters and loss calculations.

AIcontrastive learningrecommendation systemmultimodalCold Startmusic recommendation
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.