Evolution of Qunar Hotel Search Ranking: From LambdaMart to LambdaDNN and Multi‑Objective Optimization
The article details Qunar’s hotel search ranking system evolution, covering the shift from rule‑based sorting to LambdaMart, the adoption of LambdaDNN deep models, multi‑objective MMOE architectures, multi‑scenario integration, extensive feature engineering, and experimental results demonstrating significant offline and online performance gains.
1. Background
When users browse hotels online, the travel platform must select suitable hotels and reduce decision fatigue. Personalized ranking is triggered by the welcome‑ranking scenario (see Fig1).
Fig1 Qunar hotel search product form
2. Qunar Hotel Search Ranking Architecture
The overall recommendation pipeline consists of four stages: recall, coarse ranking, fine ranking, and re‑ranking. Fine ranking selects hotels most relevant to the user query, aiming to maximize the business metric s2o (order users / search users).
Fig2 Qunar search architecture
3. Evolution of Fine‑Ranking Algorithms
Stage 1 – Machine‑learning upgrade : Replace rule‑based ranking with the tree‑based LambdaMart model.
Stage 2 – Deep model upgrade : LambdaMart reached its capacity; a deeper LambdaDNN model was introduced.
Stage 3 – Multi‑objective model : Incorporate both CTR and CTCVR tasks using a Multi‑Gate Mixture‑of‑Experts (MMOE) architecture.
Stage 4 – Multi‑scenario multi‑objective model : Handle same‑city and cross‑city search scenarios within a unified model.
4. Iteration of Fine‑Ranking Algorithms
Learning‑to‑Rank (LTR) is a supervised ranking problem. Three learning paradigms are used:
Pointwise – predicts relevance for each item independently.
Pairwise – learns a preference order between item pairs.
Listwise – optimizes the entire ranked list (e.g., LambdaMart, LambdaLoss).
4.1 From LambdaMart to LambdaDNN
Reasons for the upgrade:
Hotel search exhibits strong holiday seasonality; large‑scale data requires models with higher capacity.
Tree models struggle with high‑dimensional sparse features, while deep models can capture complex patterns.
4.2 LambdaDNN Practice
The LambdaDNN model combines a DNN with the LambdaRank gradient, using a listwise loss to predict exposure‑conversion rate (ctcvr). Training details: batch size 1024, dropout and batch‑normalization applied, Adam optimizer with learning rate 0.01, input dimension 77, hidden layers [128, 86].
Fig5 Model structure
4.3 Sample Construction
Listwise samples are built from user queries and candidate items. Only lists that contain a successful order are kept; earlier days without orders are discarded. Positive samples: order label 1, click label 0.01 (clicks receive a small relevance weight). Negative samples:
Hard negatives – exposed items that were neither clicked nor ordered.
Easy negatives – bottom‑n items in the list that were never exposed.
Fig6 User search behavior example
4.4 Feature Engineering
Features are grouped into:
User features (historical orders, clicks, profile).
Hotel features (price, star level, CTR, conversion).
Cross features (user‑hotel distance, POI‑hotel distance).
Context features (time, search scenario).
Feature selection follows three principles: avoid leakage, stay business‑relevant, and ensure discriminative power. Using LambdaMart importance, the top 49 dimensions (out of 200) were kept; 28 additional dimensions were manually added, resulting in 77 features.
Fig7 Effect of top‑feature count on model performance
4.5 Experimental Evaluation
Baseline LambdaMart NDCG@10 = 0.7310; LambdaDNN NDCG@10 = 0.7463. Online conversion rate increased by 0.5%.
Model
NDCG@10
LambdaMart
0.7310
LambdaDNN
0.7463
5. Multi‑Objective Optimization
Single‑objective models suffer from sample bias, poor interpretability, and data sparsity. A multi‑objective approach jointly learns CTR and CTCVR, mitigating bias, providing probability‑based outputs, and improving generalization.
5.1 Why Multi‑Objective?
Reduces sample selection bias.
Outputs are probabilistic, easier to combine with downstream re‑ranking.
Shared representation learns from abundant CTR data, helping the sparse CTCVR task.
5.2 Model Choice
Google’s 2018 MMOE architecture is adopted: multiple shared experts with task‑specific gating networks, allowing each task to weigh experts differently.
Fig8 MMOE network structure
5.3 MMOE Sample Selection
Only lists containing clicks or orders are kept. For each list, all exposed items are used, plus a set of bottom‑n unexposed hotels as easy negatives. Duplicate clicks within the same list are de‑duplicated.
5.4 Training Issues
Loss imbalance (CTCVR loss much smaller than CTR) is addressed by up‑sampling order samples to achieve a 1:1 positive‑negative ratio. Gate saturation (softmax outputs near 0/1) is mitigated by applying dropout and temperature scaling (λ = feature dimension) to the pre‑softmax scores.
5.5 Results
LambdaDNN NDCG@10 = 0.7477. MMOE variants show similar offline NDCG (0.7405–0.7474) but online CTR improves by 2% while conversion remains stable.
MMOE
No Sampling
Order Upsampling
Gate Scaling
NDCG@10
0.74048
0.74407
0.74738
6. Multi‑Scenario Multi‑Objective Optimization
Search scenarios (same‑city, cross‑city, etc.) previously used separate models, leading to data islands and high maintenance cost. A unified MMOE model with scenario features learns both shared and scenario‑specific patterns.
Fig9 Expert weight distribution across scenarios (expert 2 dominates, indicating low scenario variance). Online order conversion increased by 0.56%.
Feature alignment categories:
Scenario‑independent features (price, click count).
Shared scenario features (POI‑hotel conversion).
Scenario‑specific features (distance in same‑city vs POI distance).
Scenario ID feature.
Fig10 Multi‑scenario multi‑objective network structure
Evaluation of model versions:
Model
Feature Handling
Experts
NDCG@10
v1 – No sample fusion
\
4
0.78416
v2 – Multi‑scenario sample merge
\
4
0.78449
v3 – Increase experts to 8
\
8
0.78566
v4 – Full‑scene feature classification
Feature‑category input
8
0.78745
7. Feature Selection
Motivations include redundancy, dimensionality curse, longer training time, higher deployment cost, and reduced interpretability.
Traditional methods (unsupervised, wrapper, filter, Boruta) suit small‑scale models. For deep models, a Dropout Feature Ranking (DFR) approach based on variational dropout is employed.
Fig11 Dropout before/after training
Variational dropout uses Concrete relaxation to make the Bernoulli mask differentiable, allowing gradient‑based optimization of per‑feature dropout probabilities (p). Adding an L1 regularizer on (1‑p) encourages sparsity.
Fig12 Dropout layer illustration
Experiments:
Adding 10 random noise features reduced NDCG@10 from 0.74956 to 0.74270 (‑0.915%).
Applying DFR and removing 14 low‑importance features increased NDCG@10 to 0.75038 and online conversion by 0.2%.
Model
NDCG@10
Relative Change
v1 – Baseline (310 features)
0.74956
v2 – Baseline + 10 noise (320 features)
0.74270
‑0.915%
v3 – Baseline + DFR (296 features)
0.75038
+0.109%
Feature importance ranking after DFR shows real‑time features (e.g., f_10, f_16, f_1) with scores >0.99, cross features (f_20, f_31) around 0.84–0.77, and noise/redundant features (f_noise, f_0) with scores <0.3.
Feature
Importance (1‑p)
Description
f_10
0.9977
Real‑time features
f_16
0.9973
f_1
0.9956
f_20
0.9915
User‑hotel cross features
f_31
0.8450
f_40
0.7709
f_noise
0.2733
Random Gaussian noise feature
f_150
0.1862
Hotel statistical (redundant) features
...
f_0
0.0001
All‑zero control feature
8. Summary and Future Work
The paper presents Qunar’s hotel search fine‑ranking evolution, demonstrating that deep models (LambdaDNN) outperform tree models, and that multi‑objective and multi‑scenario MMOE architectures further boost both offline ranking metrics and online conversion rates. Future directions include combining pointwise and listwise losses to jointly learn ranking order and classification confidence.
References:
[1] C. J. C. Burges, “From RankNet to LambdaRank to LambdaMART: An Overview,” Microsoft Research, 2010.
[2] R. K. Pasumarthi et al., “TF‑Ranking: Scalable TensorFlow Library for Learning‑to‑Rank,” KDD 2019.
[3] M. Haldar et al., “Applying Deep Learning to Airbnb Search,” KDD 2019.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.