Personalized Recommendation System Practices at Ctrip: Data, Recall, Ranking, and Feature Engineering
The article details Ctrip's large‑scale personalized recommendation system, covering data sources, recall strategies, ranking models, feature engineering techniques, and practical insights from industry research to improve user experience in travel services.
Ctrip, a leading domestic OTA, serves tens of millions of users daily and relies on personalized recommendation systems to alleviate information overload and match users with suitable travel products. The recommendation workflow consists of three stages: recall, ranking, and result generation.
The recall stage uses data engineering and algorithms to filter millions of items into a candidate set, employing traditional methods such as collaborative filtering, contextual recommendation, LBS, as well as deep learning models like RNN‑based session recommendation, CNN‑PMF hybrids, and the aSDAE model that integrates side information to address sparsity and cold‑start issues.
In the ranking stage, Ctrip combines pointwise methods with multi‑objective controls (e.g., distance, product quality) and leverages models ranging from logistic regression and tree‑based models (GBDT, Random Forest) to factorization machines and wide‑and‑deep architectures, emphasizing the continued importance of meticulous feature engineering.
Feature engineering includes explicit combinations (discretization, Cartesian product, inner product) and semi‑explicit combinations using ensemble trees (GBDT, Random Forest) to generate high‑order feature interactions, with careful handling of numerical, ordinal, and categorical features through techniques such as unsupervised/supervised discretization, one‑hot encoding, dummy encoding, and hash tricks.
The article also outlines various recall methods (Real‑time Intention, Business Rules, Context‑Based, LBS, Collaborative Filtering, Sequential Models) and discusses practical observations such as the limited benefit of discretizing continuous features for GBDT, the need to consider leaf node depth, and the use of Spark‑based distributed XGBoost for offline training and direct model parsing for online inference.
In conclusion, Ctrip's recommendation system integrates data preparation, engineering architecture, and front‑end presentation across more than ten modules and sixty scenarios, with future plans to incorporate deeper models, online learning, reinforcement learning, and transfer learning to further enhance recommendation quality.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
