Artificial Intelligence 33 min read

Architecture and Evolution of 58 Tongzhen Local Feed Recommendation System

This article details the design, data pipeline, feature engineering, model development, and iterative optimization of the 58 Tongzhen local feed recommendation system, covering business background, user profiling, recall strategies, ranking models such as XGBoost, XDeepFM, and online learning, and future directions.

DataFunSummit
DataFunSummit
DataFunSummit
Architecture and Evolution of 58 Tongzhen Local Feed Recommendation System

The 58 Tongzhen platform targets the underserved county and township markets in China, serving over 100 million users with a mix of news, job, real‑estate, automotive, and social content. Market analysis shows a predominantly low‑income, low‑education user base with high mobile usage and a strong preference for short videos and social apps.

To serve this audience, Tongzhen’s homepage Feed recommendation combines a multi‑stage pipeline: offline data collection, feature extraction, and model training, followed by online real‑time feature serving. The system architecture consists of a data layer, computation layer, algorithm strategy layer, logic layer, and application layer, supporting both batch and streaming processing via Hive, Kafka, Flink/Spark, and Redis/WTable.

Key features include extensive user profiling (230+ dimensions) and content tagging using BERT‑based semantic models, hierarchical text classification, and low‑quality content detection via perplexity‑based scoring. Feature types span user, content, cross, and context attributes, with pipelines handling both offline batch builds and real‑time updates.

Recall strategies are multi‑path, integrating precise user‑profile recall, content similarity (Word2Vec/TF‑IDF), collaborative‑filtering with clustering, deep models (Attention, DeepFM), regional hotspot recall, and freshness‑driven race strategies. The collaborative‑filtering approach clusters user click vectors to capture multiple interests.

Ranking has evolved from rule‑based scoring to tree‑based (XGBoost) + linear models, then to deep models (XDeepFM) that combine linear, CIN, and DNN components, and finally to online learning that updates models in minutes using real‑time behavior logs. Hyper‑parameters are tuned via grid search and AUC optimization.

After ranking, a fusion and rerank layer balances news and classified items using weight factors, category competition, and business rules, while a logic control layer enforces traffic quotas, de‑duplication, and diversity constraints.

Evaluation relies on an AB‑testing platform measuring CTR, conversion, dwell time, and retention. Since launch in April 2019, CTR has increased >220%, per‑user clicks >170%, and next‑day retention up 92% thanks to continuous model and pipeline improvements.

Future work includes multi‑objective optimization (ESMM‑style), deeper user intent modeling, exploration of reinforcement learning and graph neural networks, and building a knowledge graph to enrich content semantics and cold‑start handling.

big dataMachine Learningfeature engineeringAIrecommendation systemonline learningfeed ranking
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.