Deep Recall and Vector Retrieval in 58 Recruitment Recommendation System
This article presents a comprehensive overview of 58's recruitment recommendation system, detailing business challenges, multi‑stage recall strategies, vector‑based deep retrieval, cost‑sensitive loss design, session optimization, online incremental training, extensive offline and online evaluations, and practical lessons for future improvements.
The 58 recruitment platform serves a massive two‑sided market where job seekers (C‑end) and employers (B‑end) interact through short, sparse behaviors such as clicks, resume submissions, IM, and phone calls. Business statistics highlight the scale of users and the urgency of matching demand.
Business Challenges
Massive data volume with millions of users and job posts.
Cold‑start problem for new job seekers lacking detailed resumes.
Sparsity and real‑time constraints due to short interaction cycles.
Resource allocation to avoid wasteful connections.
Overall System Architecture
The recommendation flow follows four stages: intent understanding, multi‑channel recall (contextual, nearby, real‑time CF, user‑profile, tag, vector, deep recall), ranking with diverse objectives (CTR, CVR, feedback rate for B‑end), and final content display with diversity and explainability.
Vector‑Based Deep Recall
Job posts are vectorized offline using a DNN (Faiss‑based KNN index). Features such as content, company info, and contact details are embedded and pooled to form a post vector. User interest vectors are derived from historical behavior chains using Skip‑Gram with negative sampling.
Online, a two‑tower architecture aligns user and item vectors: user features (profile, context, behavior chain) and item embeddings (ID and feature embeddings) are averaged, passed through a DNN (projection, DCN, PNN variants), and L2‑normalized. The similarity score drives real‑time deep recall.
Training Optimizations
Session length tuned to 1 day for best trade‑off between performance and resource cost.
Positive sample left‑wise sampling to avoid future‑leakage; negative sampling combines global and local market space based on region and job category.
Cost‑sensitive loss incorporates behavior weights (e.g., higher weight for resume submission than click) via a manually set cost matrix.
For online incremental training, batch softmax is used to perform in‑batch negative sampling, with batch size 512 yielding the best results.
Evaluation
Offline evaluation shows a 32.09% average position improvement for resume‑submission behavior and 29.73% for click behavior after embedding‑based re‑ranking. Online A/B tests of the deep recall model improve CTR and CVR by over 1 percentage point.
Lessons Learned & Future Work
Business‑driven design is essential; all model choices must consider real‑world impact.
Pair‑review of pipelines and continuous monitoring of each recall stage are critical.
Further optimization needed for real‑time deep recall and extending improvements to both C‑end and B‑end.
Overall, the multi‑stage recall framework, vectorized deep retrieval, and cost‑sensitive training together deliver significant gains in recommendation relevance for the 58 recruitment platform.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.