Artificial Intelligence 14 min read

How a Scalable Recommendation Engine Evolved: From V1.0 to V3.0

This article details the evolution of an e‑commerce recommendation system through three architectural versions, highlighting the initial simple design, the challenges that prompted vertical and horizontal splits, the introduction of a configurable pipeline and AB testing, and the final micro‑service‑based, dynamically configurable V3.0 architecture.

IT Architects Alliance

Feb 15, 2022

How a Scalable Recommendation Engine Evolved: From V1.0 to V3.0

1. Introduction

Recommendation has become a core competitive advantage for e‑commerce platforms, appearing in many traffic entry points such as the home page, product detail page, shopping cart, order‑success page, and error pages. It improves user experience, addresses long‑tail and Matthew effects, increases user stickiness, and creates additional product value.

2. Recommendation Framework V1.0

The first version adopted a simple strategy‑plus‑factory design that enabled rapid iteration at project launch. However, several problems emerged:

All upstream business logic shared a single service, resulting in poor fault isolation, resource contention within one JVM, and limited scalability.

Rapid business growth increased system complexity; the simple strategy‑factory became a development bottleneck, necessitating modularization by recommendation stage.

Recall relied on direct Redis connections, creating a performance bottleneck and preventing aggregation of similar items.

All data were stored in a single Redis cluster, so high concurrency from one business could affect others, and scaling a single cluster became risky as data volume grew.

3. Recommendation Framework V2.0

Version 2.0 introduced vertical splitting by business scenario and horizontal splitting by recommendation stages, improving isolation and making the framework clearer. A pipeline scheduler was added to modularize the workflow into distinct stages such as recall, filtering, coarse ranking, merging, fine ranking, intervention, and shuffling. Configuration files define which modules are active for each scenario, enabling more precise resource allocation.

A configuration service was built to manage pipeline definitions and AB testing capabilities, allowing dynamic adjustments without code changes. This version solved many development‑efficiency and stability issues present in V1.0.

4. Recommendation Framework V3.0

Version 3.0 further modularized the system by separating configuration server and client, making the pipeline dynamically configurable, and extracting recall and prediction into independent services. Redis clusters were split into multiple smaller clusters to reduce risk and improve scalability.

The configuration server provides RPC interfaces for heartbeat responses and flow queries, and centralizes all scenario configurations. The client periodically polls the server, synchronizes the latest configuration, and assembles a handler chain based on user device, location, context, and experiment parameters.

Key components include:

Configuration server/client architecture with heartbeat and version checking.

Dynamic pipeline configuration that can be updated online.

Separate recall service (using Elasticsearch) and prediction service (supporting multiple models and versions).

AB testing integrated at the handler level, allowing per‑experiment strategy selection.

5. Outlook

Future work aims to build an explanation platform for personalized recommendations and to add real‑time feature services, enabling finer‑grained user preference handling and truly individualized recommendations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

architecture Big Data AI Scalability recommendation system Pipeline configuration service

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.