Artificial Intelligence 13 min read

Embedding‑Based Item‑to‑Item Similarity Recommendation for Homestay Platforms

This article describes how Tujia applied embedding techniques, inspired by word2vec and skip‑gram models, to build item‑to‑item similarity vectors for homestay recommendations, detailing the background challenges, the embedding solution, training methodology, evaluation results, practical improvements, and future development plans.

DataFunSummit
DataFunSummit
DataFunSummit
Embedding‑Based Item‑to‑Item Similarity Recommendation for Homestay Platforms

Background : Homestay bookings are low‑frequency and highly personalized, making traditional collaborative filtering ineffective. Tujia identified three recommendation approaches—content‑based, item‑to‑item collaborative filtering, and embedding‑based item similarity—and chose the embedding method for its ability to capture multi‑dimensional item relationships.

Embedding Solution : Using user click‑through logs, each homestay is represented as a low‑dimensional vector (e.g., 64‑dimensional). The skip‑gram model treats consecutive viewed houses as context, with clicked houses as positive samples and skipped houses as negative samples. Training employs negative sampling, L2 regularization on the output matrix, and adaptive learning rates.

Model Training : A dataset of millions of browsing sessions generated ~40 million training samples and 7 million evaluation samples. The model was trained on a Tesla M40 GPU for one day with a batch size of 1024 over 100 epochs, achieving stable loss curves on both training and validation sets.

Evaluation : Besides loss curves, product‑level metrics such as conversion rate uplift were measured via online A/B tests, showing a clear improvement after deployment. Visualizations demonstrated that houses similar to a user's recent clicks moved higher in the list, and a tool for supply‑chain staff enabled similarity‑based search.

Improvements and Tricks : Filter out very short dwell‑time clicks and users with excessive clicks. Limit context length to ±2 clicks within a 30‑minute window. Weight the final booking house more heavily (equivalent to five clicks). Apply negative sampling carefully: sample skipped houses and comparable houses in the same destination. Normalize embedding vectors before inference and discard items with fewer than ten occurrences. Address cold‑start by averaging embeddings of a small, similar‑item set for new houses.

Iterative Updates : To incorporate new houses without retraining from scratch, recent two‑month logs are used to generate new samples, previous model parameters are pre‑loaded, and new items are appended with random initialization before a short fine‑tuning phase.

Future Plans : Enrich behavior signals (favorites, chats, reviews), introduce attention mechanisms for temporal patterns, and jointly learn embeddings for auxiliary attributes such as price, area, and house type.

Speaker : Zhou Wenbiao, Head of Algorithms at Tujia, focuses on applying machine learning to commercial scenarios, including search, recommendation, and operational intelligence.

AB testingmachine learningrecommendationembeddinghomestayitem-to-item
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.