Artificial Intelligence 16 min read

How Xiaohongshu Leverages Large Models to Revolutionize Content Recommendation

This article details Xiaohongshu's multi‑stage recommendation pipeline—using massive multi‑modal pre‑training, long‑sequence modeling, real‑time context features, reinforcement learning and online deep learning—to precisely surface valuable content, address cold‑start challenges, and break information bubbles for billions of users.

Data Thinking Notes

Jul 8, 2025

How Xiaohongshu Leverages Large Models to Revolutionize Content Recommendation

Xiaohongshu Recommendation System: Precise Selection from Massive Data

At the 2025 Global Machine Learning Technology Conference, Xiaohongshu's recommendation algorithm lead Yan Ling presented how large models are applied in the platform's recommendation system, offering insights for the industry.

With billions of notes, Xiaohongshu needs a powerful recommendation system to quickly surface valuable information. The system performs multi‑round filtering and ranking to ensure users see the most relevant content.

Multi‑Stage Filtering Process

From billions of notes, a multi‑path recall retrieves tens of thousands of candidates based on tags, user behavior, and other dimensions.

A simple model performs coarse ranking, reducing candidates to about 5,000.

A deep model conducts fine ranking, narrowing the set to roughly 500.

Finally, diversity re‑ranking orders the remaining 80+ notes according to personalized needs, balancing relevance and variety.

The system also embeds an "algorithmic value" principle, ensuring equal distribution so that over 50% of traffic goes to creators with fewer than 1,000 followers.

Additionally, a Customer Engagement Score (CES) mechanism analyzes user interest usage and exploration behavior to further optimize recommendations.

Understanding Every Piece of Content: Multi‑Modal Representation

Xiaohongshu employs multi‑modal content understanding, pre‑training on over a billion images and videos using architectures such as BERT, RoBERTa, ResNet, Swin‑T, and ViT.

Intra‑modal Fusion : Within the LLM, features of each modality are initially fused.

Inter‑modal Fusion : Later stages further integrate modalities to form a comprehensive representation.

Cold‑Start Challenge for New Notes

Content extraction and initialization: multi‑modal models generate tags, topics, and embeddings for new notes.

Seed audience selection: topic and tag based filtering quickly identifies a small group of potential viewers.

Look‑alike expansion: behavior similarity expands the reach to users with comparable interests.

Model handover and online learning: recall, coarse, and fine models iteratively rank the note while Bayesian Optimizer adjusts weighting based on real‑time feedback.

Interaction Goal Modeling & Optimization: Multi‑Objective Estimation + Reinforcement Learning

Long‑sequence modeling captures users' historical interests. Real‑time multi‑behavior sequence modeling updates user state within seconds, reinforcing or weakening exposure based on likes and dislikes.

Real‑time context features (e.g., interactions in the past five minutes) further refine the instant interest signal.

A multi‑objective CGC model shares expert networks via gated units, balancing objectives such as click‑through rate (CTR) and interaction duration.

The system supports Online Deep Learning (ODL), handling billions of embedding parameters and updating them within minutes.

Reinforcement Learning (RL) agents (DQN, DDPG, PPO) receive real‑time state features and adjust exposure: rewarding content that triggers the desired interaction and penalizing ineffective exposure.

Graph models capture the "content‑user‑author" relationships, enabling clustering into "circles" that guide social interaction‑driven recommendation.

Diversity is ensured through recall vector perturbation, genetic algorithms, and a User‑to‑User (U2U) mechanism for interest exploration.

Large Model × Community Recommendation: Breaking the Information Bubble

Traditional recommenders rely solely on behavior data, limiting content understanding and reasoning. Large Language Models (LLMs) provide stronger text comprehension and open‑world generalization, enhancing reasoning about user intent.

Multimodal Representation Capability

The SigLip model provides a powerful vision encoder; a Fusion module performs intra‑modal and inter‑modal fusion to create rich content embeddings.

Full‑Link Application: Optimizing Recall & Ranking

The large‑model technology is applied throughout the pipeline, from recall to final ranking, achieving end‑to‑end improvement.

User Interest Inference: From Known to Unknown

The in‑house tomato‑7B model, combined with SimCSE similarity scoring, maps user behavior to a label space, identifying existing interests and inferring latent ones.

Multi‑Dimensional User Analysis

Basic Information : age, gender, city, date, etc.

Existing Interests : long‑term, medium‑term, and short‑term interests across topics and categories.

For example, a 25‑year‑old user interested in reading may be inferred to like new book recommendations; recent interest in comics plus a long‑term preference for funny videos may lead to a suggestion of funny comics.

Prompt Design: Guiding Model Reasoning

Custom prompts act as task instructions, telling the model how to use basic user info and existing interests to generate potential interest points.

"You are a data analyst proficient in interest aggregation. I will provide a user's basic information and existing interests; you need to return potential interests that meet the above requirements."

Inference Process & Confidence

New Interest Points : e.g., funny comics, makeup tips.

Reasoning Process : combines user age, curiosity, and existing interests to infer new topics.

Confidence Level : the system outputs a confidence score for each inferred interest to aid precise filtering.

recommendation system Large Language Model reinforcement learning online deep learning Multimodal Learning user interest modeling

Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.