Industry Insights 25 min read

Decoding Xiaohongshu’s Decentralized Recommendation: Sideinfo and Multimodal Fusion

This article analyzes how Xiaohongshu addresses the decentralization challenge in its recommendation system by strengthening side‑information usage, integrating multimodal signals across the full pipeline, and implementing interest exploration and protection mechanisms, while also outlining future research directions such as generative recommendation and large‑model‑driven user profiling.

NewBeeNLP

Aug 15, 2024

Decoding Xiaohongshu’s Decentralized Recommendation: Sideinfo and Multimodal Fusion

Business Background

Xiaohongshu, a large UGC community, generates billions of exposures daily, but its centralized distribution favors high‑exposure content, limiting exposure for ordinary creators and long‑tail interests.

Core Problem

Decentralized distribution must make the system learn faster (quickly capture long‑tail content and user interests) and learn better (ensure each pipeline stage effectively propagates these signals).

Strengthening Sideinfo Usage

In recall, sideinfo was previously merged weakly with note IDs, causing signal dilution. The solution separates sideinfo modeling, applying independent sequence modeling and residual connections to enhance its impact, and extends this approach to ranking.

Graph models further fuse sideinfo by constructing both collaborative‑filtering (CF) edges between notes and edges between notes and sideinfo. To mitigate exposure bias, the CF edge algorithm is upgraded to the swing method, and causal inference techniques adjust edge weights to reduce popularity bias and super‑node effects. Swing computation is moved from batch to Flink streaming for real‑time updates.

Multimodal Signal Full‑Chain Fusion

Beyond sideinfo, multimodal embeddings (text + image) are aligned with item embeddings via domain‑internal and cross‑domain alignment, forming the AlignRec framework, which won a CIKM 2024 full‑paper track.

AlignRec stages:

MMEnc aligns and fuses text and image embeddings to obtain content‑side item representations.

Behavioral user and item representations are merged using a user‑item interaction graph.

Content and behavior item representations are fused; a similarity graph aggregates content representations.

User representations are aggregated from the behavior graph and aligned with content representations as auxiliary loss.

Final cross‑domain user and item embeddings are obtained by summing behavior and content representations and applying alignment.

Multimodal features are also incorporated in recall (PDN upgrade with attention and multimodal cross‑features) and ranking (hard search in initial ranking, GSU + ESU soft search in final ranking).

Interest Exploration and Link Protection

To avoid over‑fitting to historical interests, explicit multi‑interest modeling builds a global interest pool, selects top‑k interests via retrieval, predicts the next interest, and uses a single aggregated vector for online serving.

Exploration‑Exploitation (EE) adds Gaussian noise to recall vectors and employs an Evolution Strategy to re‑inject high‑rank but unexposed items into subsequent recall cycles, improving long‑tail interest coverage.

Link protection introduces a white‑box multi‑queue framework for intermediate stages, allowing customized ranking and automatic cutoff optimization via Evolution Strategy.

Large‑model‑driven potential interest mining uses a custom prompt to generate user portraits, offline inference with the tomato‑7B model, and mapping via SIMCSE to boost relevant notes in the post‑ranking stage.

Future Outlook

Generative recommendation for higher diversity and generalization.

Interactive search using multi‑turn dialogue powered by large models.

Multimodal user profiling combining personal data and behavior.

End‑to‑end multimodal‑behavior joint training.

Global signal joint modeling via full‑domain pretrained models or graph networks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation multimodal graph decentralized-distribution interest-exploration sideinfo

Written by

NewBeeNLP

Always insightful, always fun

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.