Evolution of E‑commerce Platform Recommendation System Architecture
This article reviews the evolution of recommendation system architecture for C2C e‑commerce platforms, tracing stages from simple offline‑online pipelines through granular feed‑flow improvements, real‑time processing, and machine‑learning‑driven models, while highlighting user‑profile construction, challenges, and best‑practice guidelines.
The talk, originally presented at DataFunTalk, outlines how recommendation systems for C2C marketplaces such as Xianyu have progressed from a basic two‑stage offline‑online framework (the "Stone Age") to increasingly sophisticated architectures.
In the early stage, offline jobs generated recommendation material using Spark matrix factorization, while online services performed simple table look‑ups, resulting in low personalization, limited recall dimensions, and poor cache efficiency.
The subsequent "Bronze Age" introduced finer granularity by shifting from category‑level to item‑level targeting, adding multiple recall dimensions through collaborative filtering combined with user and item profiles, and improving data reuse across the pipeline.
The "Industrial Revolution I" phase focused on real‑time recommendation, making both offline mining and user‑interest inference live, which boosted conversion rates by 80‑90% and established continuous improvement as a core principle.
Finally, the "Machine‑Learning Era" emphasizes building a robust ML pipeline for feature generation, model training, and real‑time serving, enabling both ranking and recall models, better utilization of negative feedback, and alignment of optimization goals with business metrics.
The article also stresses the importance of effective user profiling: creating useful, well‑connected, and detailed profiles, avoiding over‑ or under‑granular attributes, and ensuring consistent, real‑time availability across teams and storage systems.
Key challenges discussed include synchronization of profile updates, heterogeneous downstream consumption (e.g., Redis vs. Elasticsearch), and coordinated upgrades across multiple services.
Overall, the evolution follows four main directions—refined relevance, multi‑dimensional recall/sort, real‑time feedback, and model‑driven computation—each contributing to incremental improvements in recommendation performance.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.