How Xiaohongshu Leverages Large Models to Revolutionize Content Recommendation
This article details Xiaohongshu's multi‑stage recommendation pipeline—using massive multi‑modal pre‑training, long‑sequence modeling, real‑time context features, reinforcement learning and online deep learning—to precisely surface valuable content, address cold‑start challenges, and break information bubbles for billions of users.
Xiaohongshu Recommendation System: Precise Selection from Massive Data
At the 2025 Global Machine Learning Technology Conference, Xiaohongshu's recommendation algorithm lead Yan Ling presented how large models are applied in the platform's recommendation system, offering insights for the industry.
With billions of notes, Xiaohongshu needs a powerful recommendation system to quickly surface valuable information. The system performs multi‑round filtering and ranking to ensure users see the most relevant content.
Multi‑Stage Filtering Process
From billions of notes, a multi‑path recall retrieves tens of thousands of candidates based on tags, user behavior, and other dimensions.
A simple model performs coarse ranking, reducing candidates to about 5,000.
A deep model conducts fine ranking, narrowing the set to roughly 500.
Finally, diversity re‑ranking orders the remaining 80+ notes according to personalized needs, balancing relevance and variety.
The system also embeds an "algorithmic value" principle, ensuring equal distribution so that over 50% of traffic goes to creators with fewer than 1,000 followers.
Additionally, a Customer Engagement Score (CES) mechanism analyzes user interest usage and exploration behavior to further optimize recommendations.
Understanding Every Piece of Content: Multi‑Modal Representation
Xiaohongshu employs multi‑modal content understanding, pre‑training on over a billion images and videos using architectures such as BERT, RoBERTa, ResNet, Swin‑T, and ViT.
Intra‑modal Fusion : Within the LLM, features of each modality are initially fused.
Inter‑modal Fusion : Later stages further integrate modalities to form a comprehensive representation.
Cold‑Start Challenge for New Notes
Content extraction and initialization: multi‑modal models generate tags, topics, and embeddings for new notes.
Seed audience selection: topic and tag based filtering quickly identifies a small group of potential viewers.
Look‑alike expansion: behavior similarity expands the reach to users with comparable interests.
Model handover and online learning: recall, coarse, and fine models iteratively rank the note while Bayesian Optimizer adjusts weighting based on real‑time feedback.
Interaction Goal Modeling & Optimization: Multi‑Objective Estimation + Reinforcement Learning
Long‑sequence modeling captures users' historical interests. Real‑time multi‑behavior sequence modeling updates user state within seconds, reinforcing or weakening exposure based on likes and dislikes.
Real‑time context features (e.g., interactions in the past five minutes) further refine the instant interest signal.
A multi‑objective CGC model shares expert networks via gated units, balancing objectives such as click‑through rate (CTR) and interaction duration.
The system supports Online Deep Learning (ODL), handling billions of embedding parameters and updating them within minutes.
Reinforcement Learning (RL) agents (DQN, DDPG, PPO) receive real‑time state features and adjust exposure: rewarding content that triggers the desired interaction and penalizing ineffective exposure.
Graph models capture the "content‑user‑author" relationships, enabling clustering into "circles" that guide social interaction‑driven recommendation.
Diversity is ensured through recall vector perturbation, genetic algorithms, and a User‑to‑User (U2U) mechanism for interest exploration.
Large Model × Community Recommendation: Breaking the Information Bubble
Traditional recommenders rely solely on behavior data, limiting content understanding and reasoning. Large Language Models (LLMs) provide stronger text comprehension and open‑world generalization, enhancing reasoning about user intent.
Multimodal Representation Capability
The SigLip model provides a powerful vision encoder; a Fusion module performs intra‑modal and inter‑modal fusion to create rich content embeddings.
Full‑Link Application: Optimizing Recall & Ranking
The large‑model technology is applied throughout the pipeline, from recall to final ranking, achieving end‑to‑end improvement.
User Interest Inference: From Known to Unknown
The in‑house tomato‑7B model, combined with SimCSE similarity scoring, maps user behavior to a label space, identifying existing interests and inferring latent ones.
Multi‑Dimensional User Analysis
Basic Information : age, gender, city, date, etc.
Existing Interests : long‑term, medium‑term, and short‑term interests across topics and categories.
For example, a 25‑year‑old user interested in reading may be inferred to like new book recommendations; recent interest in comics plus a long‑term preference for funny videos may lead to a suggestion of funny comics.
Prompt Design: Guiding Model Reasoning
Custom prompts act as task instructions, telling the model how to use basic user info and existing interests to generate potential interest points.
"You are a data analyst proficient in interest aggregation. I will provide a user's basic information and existing interests; you need to return potential interests that meet the above requirements."
Inference Process & Confidence
New Interest Points : e.g., funny comics, makeup tips.
Reasoning Process : combines user age, curiosity, and existing interests to infer new topics.
Confidence Level : the system outputs a confidence score for each inferred interest to aid precise filtering.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
