From Zero to One: Building 58.com Recruitment Personalized Recommendation System
This article details how 58.com constructed a large‑scale personalized recommendation platform for its recruitment business, covering business background, user intent modeling, knowledge‑graph and NER techniques, user profiling, multi‑stage recall strategies, ranking model pipelines, serving infrastructure, AB testing, and future research directions.
The 58 recruitment platform serves millions of job seekers and employers daily, requiring a deep understanding of user intent and a robust recommendation pipeline to handle massive data, cold‑start issues, sparsity, and real‑time constraints.
Recommendation scenarios include job feed, category suggestions, and similar‑job recommendations for both C‑end seekers and B‑end companies, with content types such as job, tag, company, and resume recommendations.
User intent is captured through textual and behavioral signals, employing keyword/regex filters, phonetic matching, NER models (BiLSTM+CRF), and classification algorithms (fastText, CNN) to identify low‑quality or malicious users and to build a knowledge graph for richer profiling.
User profiling combines statistical rules, traditional classifiers, and sequence models (LSTM, GRU, Attention) to generate long‑ and short‑term interest vectors, supporting downstream recall and ranking.
Recall is performed via three complementary methods: contextual + user‑profile expansion, item‑based collaborative filtering with behavior weighting and time decay, and deep embedding‑based retrieval using FAISS for nearest‑neighbor search.
Ranking evolves through multi‑objective models predicting CTR, CVR, and ROR, with careful sample cleaning, feature engineering, and model monitoring; a feature pipeline automates sample generation, transformation, and combination.
The serving architecture enables automatic model updates, hot‑loading, and re‑ranking mechanisms to filter low‑quality content and improve effective connections.
Additional components include list‑page content control using NLG‑generated snippets and tag highlights, and an AB‑experiment configuration center for rapid iteration.
Future work focuses on multi‑task and reinforcement learning, expanding data sources for richer user portraits, and further personalization at scale.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
