Xiaohongshu Recommendation Engineering Architecture: Graph‑Based Design and Hot‑Deployment Practices
This article presents Xiaohongshu's evolving recommendation system architecture, detailing the challenges of massive user‑generated content, the adoption of a graph‑based Ark framework for modular and scalable business logic, and the implementation of hot‑deployment techniques to accelerate algorithm iteration and reduce downtime.
With the rapid growth of mobile internet, personalized recommendation has become essential for user experience; Xiaohongshu, a lifestyle platform for young users, faces massive user‑generated content and complex business logic, prompting a share of their graph‑based recommendation architecture to inspire others.
The existing recommendation pipeline handles diverse content types (text, video, products, live streams, comments) through multi‑stage recall, extensive feature engineering, and ranking (coarse, fine, re‑ranking), processing millions of candidates and confronting scalability, maintainability, and long deployment cycles.
To address these issues, Xiaohongshu rebuilt the system on a hybrid‑cloud foundation, introducing a data platform for core business data, an engine layer offering inverted‑index, vector, feature, and ranking services (implemented in C++ for performance), and the Ark graph‑computation framework (Java‑based) that abstracts common infrastructure for search and recommendation scenarios.
The Ark framework comprises an API gateway for traffic control and routing, and a container layer with datasets and operators; it provides parallel processing, dynamic routing, and sub‑graph nesting, enabling rapid construction of new recommendation scenes while isolating business logic from low‑level concerns.
Hot‑deployment is achieved via Spring‑based class‑loader isolation and a plugin mechanism; independent class loaders allow versioned business code to run side‑by‑side, with AB routing for traffic shifting, pre‑warming, and seamless switch‑over without full service restarts, while addressing challenges such as cache duplication, middleware resource release, and class conflicts.
Future directions include extending hot‑deployment to production, moving toward serverless architectures for lower operational cost, and implementing elastic scaling to handle traffic spikes, especially for high‑demand scenarios like live‑stream recommendation.
The session concludes with a Q&A covering hot‑load complexities, serialization choices (Thrift vs. Protobuf), and strategies for handling sudden traffic surges through dynamic degradation and fast‑path computation.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.