Ctrip's Personalized Recommendation System: Data, Recall, and Ranking Practices
This article details Ctrip's large‑scale personalized recommendation system, covering the data sources, recall strategies, ranking models, feature engineering techniques, and future directions for improving recommendation quality in the travel domain.
Author Introduction : The author belongs to Ctrip Basic Business R&D – Data Products and Services Group, which applies advanced AI technologies such as personalized recommendation, natural language processing, and image recognition to the travel industry, delivering a suite of mature products like a universal recommendation engine, intelligent客服 system, and AI platform.
Overview : As a leading OTA serving tens of millions of users daily, Ctrip faces severe information overload; personalized recommendation is essential. The recommendation pipeline is divided into three stages—recall, ranking, and result generation—as illustrated in the architecture diagram.
Recall Stage : The recall phase filters millions of items into a manageable candidate set, heavily influencing downstream efficiency and quality. Ctrip employs a mix of traditional and deep‑learning methods, including Real‑time Intention (Markov prediction based on recent user actions), Business Rules (domain‑specific constraints), Context‑Based filtering (seasonal or event‑driven), LBS (GeoHash‑based proximity), Collaborative Filtering (including the aSDAE hybrid model), and Sequential Models (Matrix Factorization combined with Markov chains or RNN/LSTM‑based approaches).
Ranking Stage : Ranking models evolve from linear models with extensive feature engineering to complex non‑linear and deep models. Ctrip primarily uses Logistic Regression (LR) with L1 regularization, Factorization Machines for feature crossing, and Wide‑and‑Deep architectures where the wide part is enhanced by GBDT‑generated features. Optimization algorithms include OWL‑QN for batch learning and FTRL for online learning.
Feature Engineering : Despite advances in deep learning, carefully crafted features remain crucial. Explicit feature combinations are built by discretizing numerical, ordinal, and categorical attributes and then applying Cartesian or inner products, using techniques such as One‑Hot Encoding, Dummy Encoding, and the Hash Trick. Semi‑explicit combinations are generated by feeding continuous features into ensemble trees (GBDT or Random Forest), extracting leaf indices as binary vectors, and concatenating them with manually engineered cross features.
Practical Insights : Experiments show that discretizing continuous features before feeding them to GBDT degrades performance, while tree models naturally handle non‑linearity. High‑dimensional sparse IDs are better transformed into dense embeddings. Monte‑Carlo Search is used for XGBoost hyper‑parameter tuning, and distributed training runs on Spark clusters for offline model building, with lightweight model parsing for online inference.
Conclusion : The end‑to‑end recommendation system integrates data preparation, recall, ranking, list generation, and front‑end presentation, serving over ten business lines and sixty scenarios. Future work will incorporate more deep models, online learning, reinforcement learning, and transfer learning to further boost recommendation quality.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
