Artificial Intelligence 18 min read

How We Revamped a Content Community’s Recommendation Engine for Real‑Time, Personalized Results

This article details the evolution of the ‘逛逛’ content community’s recommendation system, comparing the legacy rule‑based Hive workflow with a new algorithm‑driven architecture that leverages Elasticsearch, Redis, multi‑stage recall, coarse‑ and fine‑ranking, re‑ranking, exposure filtering, cold‑start handling, performance tuning, and future plans for vector‑based recall and platformization.

ITPUB

Jun 25, 2022

How We Revamped a Content Community’s Recommendation Engine for Real‑Time, Personalized Results

Recommendation Engine Overview

A recommendation engine is an information‑filtering system that predicts a user's interest over a large item pool and returns a ranked list, even when the user has no explicit query.

Legacy Rule‑Based Pipeline

The original "逛逛" recommendation relied on Dataman workflows and Hive jobs. Business data (posts, user behavior, profiles) were exported to Hive, where a chain of dependent tasks produced recommendation tables. Those tables were later copied into MySQL or PostgreSQL for the front‑end. This rule‑based approach was static, had high latency, and delivered the same first‑page results to all users.

Algorithm‑Driven Architecture

The upgraded system introduces a dedicated recommendation service that handles user requests in real time. Online stores are Elasticsearch (ES) for complex search and Redis for low‑latency look‑ups. Offline jobs compute item quality scores and similarity tags, and a ranking model deployed on a decision‑flow platform provides personalized ordering.

Service Workflow

Recall : fetch candidate items from ES and Redis using multi‑path strategies (LBS, tag, follower).

Coarse ranking : fast, rule‑based pruning of the large candidate set.

Fine ranking : model‑driven scoring of the reduced set for higher relevance.

Re‑ranking : apply business rules (sliding‑window, weighted distribution, mind‑set cultivation) to diversify results and avoid user fatigue.

The final list is returned to the backend and then to the front‑end for display.

Recall Layer Details

Recall ingests data from three sources:

PostgreSQL – real‑time business tables.

Hive – archived batch data.

Kafka – streaming user behavior.

These sources are streamed into ES and Redis via the Search platform and Dataman. Parallel recall paths (location‑based service, tag‑based, follower‑based) are executed concurrently and merged before ranking.

Two‑Stage Ranking

Coarse ranking uses simple heuristic rules (e.g., score thresholds) to quickly eliminate millions of candidates. Fine ranking applies a learned model (e.g., Gradient Boosted Trees or Deep Neural Network) to the remaining few hundred items, achieving higher precision while keeping latency acceptable.

Re‑ranking and Business Rules

After fine ranking, a re‑ranking layer enforces additional constraints:

Sliding‑window insertion to limit the number of items from a single influencer per page.

Weighted distribution to ensure category balance.

Mind‑set cultivation: insert sponsored or newly‑released items at controlled positions using “jump‑insert” logic.

Exposure Filtering

To prevent duplicate recommendations, Redis stores two keys per user: real_exposure – items the user has actually viewed. interface_exposure – items returned by the last request; this key expires quickly.

When a new request arrives, the service merges both sets, removes any overlapping items from the recall results, and returns a deduplicated list.

Cold‑Start Handling

User cold‑start : most users have activity in other Haola services, so cross‑service profiles provide sufficient features.

Item cold‑start :

Assign default feature values for new items.

Add dedicated new‑item recall paths.

Use a “traffic pool” where each new item is treated as an arm in a multi‑armed bandit; exposure is allocated based on observed CTR.

Performance Monitoring and Optimization

Instrumentation logs latency for each pipeline stage and visualizes the data in Grafana dashboards. Identified bottlenecks:

Recall latency due to multiple parallel ES/Redis calls.

Model inference time.

Optimization steps:

Consolidate parallel ES calls with the _msearch multi‑search API.

Batch Redis queries using pipeline to reduce round‑trip overhead.

Limit concurrent outbound calls per request to two threads, easing thread‑pool pressure.

Reliability and Fallback Mechanisms

Multiple fallbacks ensure continuity:

Fallback recall results when one or more recall channels return empty.

Fallback ranking service when the model endpoint is unavailable.

Backend fallback that serves a static recommendation list if the entire recommendation service is down.

All fallback activations generate alerts via Argus for rapid incident response.

Results and Future Roadmap

After migration, key engagement metrics improved significantly (PV‑CTR and UV‑CTR). Planned enhancements include:

Vector‑based recall using approximate nearest‑neighbor search to enrich similarity matching.

Additional recall paths to increase diversity.

Platformization of the recommendation service so that multiple business lines can share the same codebase and operational infrastructure, reducing duplication and maintenance cost.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Real-time recommendation system vector search cold-start algorithmic ranking

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.