Inside 58.com’s Smart Recommendation Engine: Architecture, Algorithms, Data
58.com’s intelligent recommendation system, evolving from a C++ monolith in 2014 to a Java-based micro‑service platform, integrates multi‑layer data processing, diverse recall and ranking algorithms, and a robust microservice architecture to deliver personalized listings across housing, jobs, cars, and more.
Overall Architecture
58.com’s recommendation system is organized into three layers: data, strategy, and application. It ingests rich business and user behavior data, applies various strategies for mining and analysis, and delivers results to multiple recommendation scenarios.
Data Layer
Business data includes user profiles and post attributes (housing, cars, jobs, etc.).
User behavior logs (clicks, calls, chats, etc.) are stored both in batch on HDFS and in real‑time streams via Kafka.
Strategy Layer
Based on offline and real‑time data, the system computes user and post profiles, then performs two core steps: recall and ranking.
Recall
Popular recall using exposure and click logs.
Geographic recall leveraging user location.
Interest recall via tag‑based post retrieval.
Association‑rule recall (time‑decayed support).
Collaborative filtering (batch and real‑time).
Matrix factorization (SVD).
DNN recall learning user and post embeddings.
Recall results are fused by either hierarchical selection or weighted modulation, with configurable priorities and ratios.
Ranking
Ranking uses pointwise machine‑learning models to predict click‑through, conversion, and dwell‑time scores. Models evolved from LR → FM → XGBoost → GBDT+LR/GBDT+FM → deep models such as FNN and Wide&Deep. Features are extracted from user, post, author, and context dimensions.
Application Layer
"Guess You Like" – top‑N recommendation on home and category pages.
Detail‑page related posts.
Search‑with‑few‑results recommendation.
Personalized push notifications.
Feed‑style continuous scrolling.
System Architecture
The backend follows a micro‑service design with data, logic, and access layers.
Data services (search, recall source, post feature store, user feature store) are exposed via RPC, hiding storage details (WRedis, files, WTable).
Logic services implement recall, ranking, and AB‑test experimentation.
Access services provide HTTP/RPC interfaces for client apps.
The system runs on a Java stack, using custom RPC (SCF), monitoring (WMonitor), and high‑performance techniques (multithreading, caching, JVM tuning) to handle hundreds of millions of daily requests with ~30 ms latency.
Data and Metrics
Core data includes exposure, click, conversion, and dwell‑time logs. A unified parameter string carries identifiers (exposure ID, recall ID, ranking ID, etc.) through backend, access layer, and client for accurate sample generation and metric calculation.
Effect metrics are collected offline (MapReduce + Hive + MySQL, later migrated to Kylin) and online (Storm + HBase). Multi‑dimensional analysis across business lines, devices, and recommendation slots drives continuous optimization.
Conclusion
Over the past year, 58.com expanded recommendation coverage to nearly a hundred slots across housing, cars, jobs, and more, increasing the share of clicks from recommendation slots by 2–3× to 20‑30 % and generating tens of millions of daily recommendation clicks. Future work will focus on improving conversion metrics (chat and call actions) to further enhance user value.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
