Artificial Intelligence 15 min read

Deep Recall and Vector Retrieval in 58 Recruitment Recommendation System

This article presents a comprehensive overview of 58's recruitment recommendation system, detailing business challenges, multi‑stage recall strategies, vector‑based deep retrieval, cost‑sensitive loss design, session optimization, online incremental training, extensive offline and online evaluations, and practical lessons for future improvements.

DataFunTalk

Jun 17, 2020

Deep Recall and Vector Retrieval in 58 Recruitment Recommendation System

The 58 recruitment platform serves a massive two‑sided market where job seekers (C‑end) and employers (B‑end) interact through short, sparse behaviors such as clicks, resume submissions, IM, and phone calls. Business statistics highlight the scale of users and the urgency of matching demand.

Business Challenges

Massive data volume with millions of users and job posts.

Cold‑start problem for new job seekers lacking detailed resumes.

Sparsity and real‑time constraints due to short interaction cycles.

Resource allocation to avoid wasteful connections.

Overall System Architecture

The recommendation flow follows four stages: intent understanding, multi‑channel recall (contextual, nearby, real‑time CF, user‑profile, tag, vector, deep recall), ranking with diverse objectives (CTR, CVR, feedback rate for B‑end), and final content display with diversity and explainability.

Vector‑Based Deep Recall

Job posts are vectorized offline using a DNN (Faiss‑based KNN index). Features such as content, company info, and contact details are embedded and pooled to form a post vector. User interest vectors are derived from historical behavior chains using Skip‑Gram with negative sampling.

Online, a two‑tower architecture aligns user and item vectors: user features (profile, context, behavior chain) and item embeddings (ID and feature embeddings) are averaged, passed through a DNN (projection, DCN, PNN variants), and L2‑normalized. The similarity score drives real‑time deep recall.

Training Optimizations

Session length tuned to 1 day for best trade‑off between performance and resource cost.

Positive sample left‑wise sampling to avoid future‑leakage; negative sampling combines global and local market space based on region and job category.

Cost‑sensitive loss incorporates behavior weights (e.g., higher weight for resume submission than click) via a manually set cost matrix.

For online incremental training, batch softmax is used to perform in‑batch negative sampling, with batch size 512 yielding the best results.

Evaluation

Offline evaluation shows a 32.09% average position improvement for resume‑submission behavior and 29.73% for click behavior after embedding‑based re‑ranking. Online A/B tests of the deep recall model improve CTR and CVR by over 1 percentage point.

Lessons Learned & Future Work

Business‑driven design is essential; all model choices must consider real‑world impact.

Pair‑review of pipelines and continuous monitoring of each recall stage are critical.

Further optimization needed for real‑time deep recall and extending improvements to both C‑end and B‑end.

Overall, the multi‑stage recall framework, vectorized deep retrieval, and cost‑sensitive training together deliver significant gains in recommendation relevance for the 58 recruitment platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Deep Learning recommendation system Vector Retrieval online training cost-sensitive loss real-time recall

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.