Artificial Intelligence 17 min read

How We Built and Optimized a Multi‑Pool Recommendation System for Boss Circle

This article explains the design, implementation, and iterative optimization of Boss Circle's recommendation engine, covering the initial simple ranking, the introduction of Elasticsearch‑based scoring, multi‑pool data sources, machine‑learning experiments, real‑time feature handling, and future personalization challenges.

SQB Blog

Jul 20, 2023

How We Built and Optimized a Multi‑Pool Recommendation System for Boss Circle

What Is Boss Circle?

Boss Circle is a dedicated content community for entrepreneurs, similar to Weibo or Xiaohongshu, where users can post text, images, videos, join topics, share store‑opening experiences, make friends, and seek help.

Why Build a Recommendation System?

Initially, content was shown chronologically, but as the volume grew, the need arose to surface high‑quality posts, increase creator exposure, and improve user engagement and retention, prompting the design of a custom recommendation system.

Initial Recommendation Approach

Overall Idea

With a relatively small user base, we first adopted a simple strategy: rank posts by popularity (likes, comments, shares) and manually curated high‑quality content, while trying to keep user‑interest posts near the top.

How to judge a post’s popularity?

Higher counts of forwards, comments, and likes indicate higher popularity and should be ranked higher.

How to infer user interest?

We tag both users and posts. User tags are derived from profile attributes and selected interest categories; post tags are set manually by operations. A match suggests user interest.

Additional flags for high‑quality or non‑compliant posts also affect ranking.

Implementation Details

Data filtering and sorting are powered by Elasticsearch (ES) using its function_score feature. We combine weight, field_value_factor, and decay_function to compute a custom score for each document.

Core post data is streamed from the business database binlog to an ES cluster deployed in a cloud‑native environment with elastic scaling.

The recommendation pipeline consists of three stages:

Filtering: select recent posts and discard low‑quality or non‑compliant items.

Coarse Ranking: order by timestamp or, if a query is present, by ES relevance scores.

Fine Ranking: apply a pre‑defined formula to compute a final score and present the highest‑scoring posts.

Running‑time Summary

After deployment we observed limited improvements: recommendation diversity was low, high‑popularity posts dominated, and we could not control the proportion of different content types.

Recommendation System Optimizations

Overall Idea

Inspired by Weibo’s recommendation platform, we introduced a multi‑pool architecture, automated sensitive‑content review via a cloud service, and incorporated user‑behavior signals (clicks, comments, likes) to boost personalization.

We also adopted concepts from Twitter’s open‑source candidate‑generation pipeline, creating virtual data pools that share physical storage but apply different filtering rules.

The optimized architecture is divided into three layers:

1) Service Layer

Provides user, post, comment, tag, and tracking services.

User service: fetch avatar, nickname, follow relations.

Post & comment services: retrieve content, like/forward counts, hot comments.

Tag system: supports AB testing and manual placement of posts.

Tracking system: records user actions for analysis.

Data warehouse: runs analytics to infer user interests.

2) Virtual Data Pools

Each pool represents a distinct filtering rule (e.g., popularity‑based pool, follow‑based pool).

3) Storage Layer

Business data stored in MySQL (posts, comments, users, topics).

Posts are synchronized to ES for ranking.

Redis is used for caching, distributed sessions, and locks.

Data Pool Overview

Examples of pools:

Popularity pool: ranks by combined forward, comment, and like counts.

Followed‑user pool: shows posts from users the current user follows.

These pools are combined according to configurable recommendation strategies, with the popularity pool as the primary source.

Data Aggregation Process

New posts pass through content‑safety and low‑quality detection services, receiving tags such as “violating”, “to be reviewed”, “low quality”, or “high quality”. Manual adjustments can also modify scores.

Negative feedback (e.g., “not interested”) is recorded and used to filter future recommendations.

When a recommendation request arrives, each pool supplies a batch of posts; the system merges them according to the strategy order, deduplicates by post ID, user ID, and account type, and caches paginated results for fast response.

Deduplication

Post deduplication: keep the first occurrence of a duplicate post.

User deduplication: avoid showing multiple posts from the same user on a page.

Account‑type deduplication: prevent consecutive posts from business or media accounts.

Fallback Pools

If a primary pool returns insufficient data (e.g., the follow pool is empty), the system falls back to secondary pools configured per strategy.

Additional Features

Mixed placement allows operators to inject specific posts into the feed to enrich content.

Historical‑record deduplication removes posts the user has already seen.

The final output is a fully controllable recommendation rule engine for operations.

Initial Machine‑Learning Exploration

Beyond rule‑based ranking, we experimented with logistic regression, random forest, BTM, and FFM models for both coarse and fine ranking.

Feature Engineering

We added word‑vector (W2V) representations of post text, computing cosine similarity between user and post vectors, which raised AUC by 2%.

We also derived a principal component from comment and forward counts, creating a “post engagement score” that improved AUC by another 1%.

Model Experiments

Graph‑based similarity, BTM topic modeling, and random‑forest classifiers were tested. BTM improved interaction conversion but was slower; random forest gave higher precision but lower read‑through rates. Overall, the multi‑pool rule‑based model remained competitive.

Real‑Time Recommendation Flow

Static features are pre‑computed offline and stored; at request time, we fetch user and post features, perform simple interactions, apply regression coefficients, compute probabilities, and sort.

Future Outlook

As user volume and behavior data grow, the multi‑pool architecture faces new challenges in personalization and active‑user growth. Ongoing work includes deeper machine‑learning integration and exploration of content‑based and social‑graph‑based recommendation techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

personalization recommendation system Elasticsearch ranking Data Pipelines

Written by

SQB Blog

Thank you all.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.