Artificial Intelligence 20 min read

From Daily to Minute-Level Updates: Real-Time Recommendation System Enhancements at Xiaohongshu

Xiaohongshu transformed its recommendation pipeline from daily to minute‑level updates by redesigning recall, ranking and feature‑joining components, deploying a base‑plus‑incremental training scheme, migrating Spark to Flink, rewriting services in C++, and optimizing RocksDB, which yielded over 10% longer dwell time, 15% more interactions and roughly 50% higher new‑note efficiency.

Xiaohongshu Tech REDtech

Mar 21, 2023

From Daily to Minute-Level Updates: Real-Time Recommendation System Enhancements at Xiaohongshu

Xiaohongshu's recommendation system heavily influences feed effectiveness, especially as a UGC platform. In early 2021, the core modules (recall, coarse ranking, fine ranking) were updated only on a daily basis, limiting the system's ability to quickly capture user‑note interactions.

To improve timeliness, the team iteratively upgraded the pipeline: recall and indexing, as well as ranking models, were transitioned from day‑level to hour‑level and finally to minute‑level updates. This resulted in a dramatic boost in distribution efficiency and noticeable business gains.

Key challenges addressed:

Behavior attribution: traditional pipelines waited ~30 minutes after an item was shown before collecting interaction labels, hindering minute‑level model updates.

Feature joining: required fast joins of front‑end and back‑end logs with low latency.

Model training stability: real‑time training introduced volatility and robustness issues.

Model deployment: synchronizing hundreds of nodes for low‑latency inference while maintaining stability.

During the first phase (hour‑level), the team introduced a PS‑worker separated deployment where workers update dense parameters and the parameter server (PS) updates sparse embeddings. This allowed hourly releases for sparse parts while keeping dense updates daily.

Model training strategy: The system adopted a base + incremental approach. The base model is trained daily on the full dataset, while an incremental model runs hourly on real‑time data, updating only sparse embeddings and keeping the dense network fixed. Batch Normalization layers were also frozen during incremental updates to improve robustness.

For recall, the most significant bottleneck after ranking upgrades was the recall module itself. Various recall channels (point recall, collaborative‑filtering recall, model‑based recall, graph recall) were originally day‑level. Over roughly a year, each channel was upgraded to minute‑level, involving extensive engineering work such as:

Transforming collaborative‑filtering (CF) pipelines from Spark (daily) to Flink (minute‑level) and using a custom KV store (RedKV).

Real‑time Swing algorithm (OnlineSwing) with minute‑level updates.

GraphSAGE‑based heterogeneous graph recall with real‑time edge updates.

Real‑time training of DSSM, multi‑interest, session‑graph, contrastive learning, and other recall models.

Upgrading LR recall and near‑line recall to minute‑level, including moving CPU workloads to GPU for near‑line scanning.

The vector retrieval service (ANN) was also modernized. The legacy Java‑centric service was rewritten in pure C++, integrated multiple state‑of‑the‑art algorithms (HNSW, ScaNN), and its end‑to‑end latency was reduced to under 10 minutes for full index refresh and under 1 minute for incremental updates. Additional improvements included a C++‑only service stack, a real‑time rule engine (Lua‑based), and extensive RocksDB tuning to cut embedding lookup latency from 6 minutes to ~1.5 minutes.

Throughout the upgrade, the team faced difficulties such as limited C++ infrastructure, complex Lua‑based rule execution performance, RocksDB tuning, and integrating ScaNN (compiler and protobuf version mismatches). These were resolved through months of engineering effort.

The final system delivered over a 10% increase in average user dwell time, more than 15% uplift in interactions, and nearly a 50% improvement in new‑note efficiency, making it one of the most impactful projects for the algorithm team.

Future work aims to extend these high‑timeliness technologies to advertising and other business lines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation system vector search large-scale systems model serving Real-time Training

Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.