Algorithmic Strategies and Insights from Ctrip Hotel Ranking Team’s Participation in the 2018 ACM WSDM and RecSys Challenges
This article details the Ctrip Hotel ranking team's feature‑engineering and model‑innovation approaches—including session features, cold‑start mitigation, discriminative re‑weighting, and ensemble methods—that secured Top‑5 placements in the 2018 ACM WSDM and RecSys recommendation system competitions.
Author : Zhu Lin, senior algorithm engineer in Ctrip Hotel R&D's ranking algorithm group, holds a Ph.D. from University of Science and Technology of China and focuses on recommendation system algorithms.
Abstract : With the rapid rise of artificial intelligence and big‑data technologies, recommendation systems have become ubiquitous across domains such as movies, music, news, books, and more. This article presents the algorithmic strategies and lessons learned by the Ctrip Hotel ranking team during their participation in the 2018 ACM WSDM and ACM RecSys challenges, where they achieved Top‑5 results.
1. Competition Overview
2018 ACM WSDM Challenge – organized by ACM and KKBOX – required building a system to predict which songs a user would replay within a certain time frame, using a dataset containing user and song metadata, listening activities, and app information.
2018 ACM RecSys Challenge – organized by ACM and Spotify – aimed to automatically continue user playlists, providing a dataset of one million user‑created playlists and associated metadata, plus ten thousand incomplete playlists for evaluation.
2. Methodological Innovations
2.1 Feature‑Engineering Innovations
Beyond conventional categorical and statistical features, the team extracted temporal information from the sequentially ordered data, constructing item‑age features and session‑based features that capture the recency and co‑occurrence patterns of users, songs, artists, and composers.
Session features were derived by greedily grouping consecutive listening records of the same user into sessions, then computing counts of sessions per user, average songs per session, and session length.
For the RecSys challenge, playlist‑based co‑occurrence was modeled using a word2vec‑style embedding where playlists are sentences and songs are words, providing similarity features between songs.
2.2 Model Innovations
To address cold‑start issues caused by high‑cardinality categorical features, the team applied denoising auto‑encoders and dropout, training models without user‑id or song‑id features and later fusing them with the original models.
Improvements to item‑based collaborative filtering were introduced, including discriminative re‑weighting inspired by the SLIM algorithm, which learns sparse linear weights via an L2‑regularized SVM formulation to better capture feature importance.
Ensemble techniques combined collaborative‑filtering coarse‑ranking with Gradient Boosted Decision Trees (GBDT) for fine‑ranking, incorporating metadata‑derived features and the re‑weighted similarity scores.
Additional models such as Factorization Machines and deep neural networks were employed in the WSDM Cup to embed high‑cardinality IDs into low‑dimensional spaces, enhancing generalization to unseen categories.
3. Summary
Continuous feature exploration, problem‑specific algorithmic innovations, and effective model ensembles substantially improve recommendation quality, demonstrating the power of artificial‑intelligence techniques in delivering personalized travel product recommendations for Ctrip users.
Contributors: Chen Yihong and He Bowen from Ctrip Hotel R&D also contributed to this work.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.