Design and Evolution of a Scalable Recommendation System Architecture (V1.0‑V3.0)
This article describes the progressive redesign of an e‑commerce recommendation platform—from a simple strategy‑factory V1.0 through a vertically split V2.0 to a fully configurable, pipeline‑driven V3.0—highlighting architectural challenges, Redis clustering, dynamic configuration, recall and prediction services, and future directions for fine‑grained, explainable recommendations.
1. Introduction Recommendation has become the core competitive advantage for e‑commerce platforms, appearing on virtually every page (home, detail, cart, checkout, error, etc.). It improves user experience, mitigates long‑tail effects, and drives product value and profitability.
2. Architecture Evolution
V1.0 used a simple strategy‑plus‑factory design to enable rapid business iteration but suffered from poor isolation, resource contention, and scaling limits due to a single JVM serving all upstream services.
V2.0 introduced vertical business splitting and horizontal stage‑based decomposition, isolating applications and storage per business line, reducing fault impact, and improving resource utilization. A pipeline scheduler was added to modularize stages such as recall, filter, coarse‑ranking, merge, fine‑ranking, intervention, and shuffling.
V3.0 adds a configuration service (server & client) that dynamically manages the recommendation pipeline. Handlers (pipeline nodes) are configured with AB‑test and strategy attributes, allowing runtime adjustments without code changes. The system now separates recall and prediction services into independent micro‑services, improving scalability and performance.
3. Configuration Service Design
The configuration server exposes RPC interfaces for heartbeat and configuration queries. It centrally manages all recommendation scenarios, enabling online strategy changes. The client periodically polls the server, synchronizes configurations, and assembles an executable handler chain based on user context (device, location, etc.).
4. Recall Service A unified full‑product recall pool is built in Elasticsearch, replacing scattered Redis storage. Real‑time product updates (price, stock, status) are propagated via MQ to keep the recall index fresh, ensuring all recall paths benefit from the latest data.
5. Prediction Service Model prediction is exposed as a service supporting multiple models and versions. Configuration‑driven sorting strategies can be switched on‑the‑fly, enhancing system performance, scalability, and the ability to deploy more complex models.
6. Outlook Future work focuses on fine‑grained operation, explainable recommendations, and real‑time feature enrichment to achieve personalized, “one‑to‑one” recommendations while maintaining system stability and efficiency.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
