Artificial Intelligence 20 min read

Intelligent Recommendation System for 58 Tongzhen: Architecture, Data, Features, and Model Evolution

This article describes how 58 Tongzhen leverages AI technologies—including data pipelines, feature engineering, various recall and ranking models, and AB‑testing—to build a personalized feed recommendation system for the down‑market, detailing its overall architecture, data sources, model iterations, performance gains, and future directions.

58 Tech

Apr 1, 2020

Intelligent Recommendation System for 58 Tongzhen: Architecture, Data, Features, and Model Evolution

Background AI is a strategic technology driving productivity and new services, yet adoption remains uneven across regions; fourth‑ and fifth‑tier cities and rural areas still face information gaps, creating a strong demand for AI‑enabled solutions.

58 Tongzhen, a key strategic business covering over 10,000 town stations in 31 provinces and serving more than 100 million users, aims to provide precise local information by combining private traffic from town‑level site owners with public traffic from the 58 local app, using AI to improve user profiling, conversion, and experience.

Scenario Overview The Tongzhen intelligent recommendation adopts a feed‑style UI to deliver multi‑category content (news, jobs, housing, cars, social) to down‑market users, supporting high‑growth, efficient conversion and long‑term retention.

Overall Architecture The system consists of five layers: data foundation, data computation, algorithm strategy, logic, and application. It ingests business, log, and label data, applies machine‑learning, deep‑learning, and NLP techniques for recall and click‑through‑rate (CTR) prediction, and merges top‑N results across categories for the homepage feed.

Data & Features Core data includes business transactions, behavior logs, and user profile tags. Feature engineering transforms raw logs into training samples via cleaning, sampling, combination, transformation, and discretization. Content sources cover news (text, image, video) and classified listings (jobs, housing, cars, etc.). Tags are extracted using BERT‑based models (e.g., job title extraction with 93 % accuracy, location and housing attributes around 80 %). Semantic, hidden‑semantic, spatio‑temporal, and quality features (including low‑quality and low‑vulgarity classifiers) are incorporated.

Algorithm Models – Recall Multiple recall strategies are employed: user‑profile tag recall, text‑similarity recall (TF‑IDF + Word2Vec), algorithmic model recall (ItemCF, Attention, DeepFM), hotspot recall (regional and global), and bandit‑based cold‑start recall. Real‑time and offline pipelines run on Kafka, Spark Streaming, and Flink.

Algorithm Models – Ranking Ranking evolved through four stages: (1) rule‑based sorting; (2) tree‑model + linear model (GBDT+LR, later XGBoost+LR) with feature sampling and regularization; (3) deep learning models (DeepFM, XDeepFM) for higher‑order feature interactions; (4) fusion of XGBoost+LR and XDeepFM, yielding ~5 % additional CTR lift.

Fusion Control Model A re‑ranking layer balances traffic and diversity across content categories by normalizing scores from category‑specific models and applying weighted aggregation based on user‑region preferences and business rules.

AB Testing & Evaluation An AB‑test platform orthogonally splits traffic at recall, ranking, fusion, and presentation layers, supporting UV/PV‑based splits and dynamic configuration. Continuous offline and online evaluations have increased overall CTR by ~175 % since April, with detailed trend charts demonstrating the improvement.

Conclusion & Future Work The system emphasizes robust user profiling and feature extraction, addressing the challenges of heterogeneous, locally‑focused content. Future plans include deeper user intent mining, richer contextual tags, new network architectures (including vision and reinforcement learning), multi‑objective optimization, and expansion of the user‑profile knowledge graph.

References 1. https://arxiv.org/pdf/1803.05170.pdf 2. https://blog.csdn.net/yfreedomliTHU/article/details/91386734

Author Yan Wenchang – 58 Algorithm Architect & Technical Committee Member.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AB testing feature engineering AI Deep Learning recommendation system Ranking Models

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.