How Ctrip Scales Personalized Travel Recommendations: From Recall to Ranking
This article details Ctrip's end‑to‑end personalized recommendation system for travel, covering data collection, candidate recall methods, ranking models, feature engineering practices, and future directions, illustrating how millions of users receive tailored travel suggestions.
Ctrip, a leading domestic OTA, serves tens of millions of users daily and relies on personalized recommendation systems to alleviate information overload and match users with suitable travel products.
1. Data
Machine learning is built on data, features, and models. Ctrip leverages product attributes (e.g., location, star rating), product statistics (orders, views, clicks), user profiles (age, gender, preferences), and user behavior (reviews, ratings, browsing, searches, bookings). Statistical metrics such as CTR are often smoothed with Bayesian methods.
2. Recall
The recall stage generates a limited candidate set from millions of items, heavily influencing downstream ranking efficiency and quality. Sparse user‑item interactions in travel pose challenges, so Ctrip combines several effective approaches:
Real‑time Intention : Uses a Markov‑based model on recent user actions to predict immediate intent.
Business Rules : Applies domain‑specific constraints (e.g., recommend hotels only after a flight search).
Context‑Based : Considers seasonal contexts such as winter skiing or New Year travel.
LBS : Leverages GeoHash to filter nearby hotels, attractions, and restaurants based on the user's current location.
Collaborative Filtering : Implements a deep hybrid model aSDAE that incorporates side information to address data sparsity and cold‑start problems.
Sequential Model : Combines Matrix Factorization with Markov chains, and explores RNN/LSTM‑based session recommendations.
Other deep models (DNN, AE, CNN) are also applied where appropriate.
3. Ranking
Personalized ranking treats each user as a multi‑task learning problem, often using conjunction features (user‑product cross features). Commonly used models include Logistic Regression with L1 regularization, Factorization Machines for feature crossing, and Wide‑&‑Deep architectures where the wide part may be replaced by GBDT‑generated features.
Feature engineering remains indispensable and is divided into explicit and semi‑explicit combinations.
Explicit Feature Combination : Discretize features then apply Cartesian product or inner product. Types of features:
Numerical – discretized via equal‑frequency, equal‑width, or supervised methods (1R, entropy‑based).
Ordinal – encoded to reflect order (e.g., three levels of hygiene quality).
Categorical – transformed using OHE, Dummy Encoding, or Hash Trick.
Semi‑Explicit Feature Combination : Tree‑based models (GBDT, Random Forest) generate leaf‑index paths that act as high‑order feature interactions, which are then one‑hot encoded.
4. Summary
The complete recommendation system integrates recall, ranking, list generation, data processing, infrastructure, and front‑end display. Ctrip’s platform serves over ten business lines and sixty scenarios. Future work aims to incorporate more deep models, online learning, reinforcement learning, and transfer learning to further improve recommendation quality.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
