Artificial Intelligence 36 min read

Evolution of Qunar Hotel Search Ranking: From LambdaMart to LambdaDNN and Multi‑Objective Optimization

The article details Qunar’s hotel search ranking system evolution, covering the shift from rule‑based sorting to LambdaMart, the adoption of LambdaDNN deep models, multi‑objective MMOE architectures, multi‑scenario integration, extensive feature engineering, and experimental results demonstrating significant offline and online performance gains.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Evolution of Qunar Hotel Search Ranking: From LambdaMart to LambdaDNN and Multi‑Objective Optimization

1. Background

When users browse hotels online, the travel platform must select suitable hotels and reduce decision fatigue. Personalized ranking is triggered by the welcome‑ranking scenario (see Fig1).

Fig1 Qunar hotel search product form

2. Qunar Hotel Search Ranking Architecture

The overall recommendation pipeline consists of four stages: recall, coarse ranking, fine ranking, and re‑ranking. Fine ranking selects hotels most relevant to the user query, aiming to maximize the business metric s2o (order users / search users).

Fig2 Qunar search architecture

3. Evolution of Fine‑Ranking Algorithms

Stage 1 – Machine‑learning upgrade : Replace rule‑based ranking with the tree‑based LambdaMart model.

Stage 2 – Deep model upgrade : LambdaMart reached its capacity; a deeper LambdaDNN model was introduced.

Stage 3 – Multi‑objective model : Incorporate both CTR and CTCVR tasks using a Multi‑Gate Mixture‑of‑Experts (MMOE) architecture.

Stage 4 – Multi‑scenario multi‑objective model : Handle same‑city and cross‑city search scenarios within a unified model.

4. Iteration of Fine‑Ranking Algorithms

Learning‑to‑Rank (LTR) is a supervised ranking problem. Three learning paradigms are used:

Pointwise – predicts relevance for each item independently.

Pairwise – learns a preference order between item pairs.

Listwise – optimizes the entire ranked list (e.g., LambdaMart, LambdaLoss).

4.1 From LambdaMart to LambdaDNN

Reasons for the upgrade:

Hotel search exhibits strong holiday seasonality; large‑scale data requires models with higher capacity.

Tree models struggle with high‑dimensional sparse features, while deep models can capture complex patterns.

4.2 LambdaDNN Practice

The LambdaDNN model combines a DNN with the LambdaRank gradient, using a listwise loss to predict exposure‑conversion rate (ctcvr). Training details: batch size 1024, dropout and batch‑normalization applied, Adam optimizer with learning rate 0.01, input dimension 77, hidden layers [128, 86].

Fig5 Model structure

4.3 Sample Construction

Listwise samples are built from user queries and candidate items. Only lists that contain a successful order are kept; earlier days without orders are discarded. Positive samples: order label 1, click label 0.01 (clicks receive a small relevance weight). Negative samples:

Hard negatives – exposed items that were neither clicked nor ordered.

Easy negatives – bottom‑n items in the list that were never exposed.

Fig6 User search behavior example

4.4 Feature Engineering

Features are grouped into:

User features (historical orders, clicks, profile).

Hotel features (price, star level, CTR, conversion).

Cross features (user‑hotel distance, POI‑hotel distance).

Context features (time, search scenario).

Feature selection follows three principles: avoid leakage, stay business‑relevant, and ensure discriminative power. Using LambdaMart importance, the top 49 dimensions (out of 200) were kept; 28 additional dimensions were manually added, resulting in 77 features.

Fig7 Effect of top‑feature count on model performance

4.5 Experimental Evaluation

Baseline LambdaMart NDCG@10 = 0.7310; LambdaDNN NDCG@10 = 0.7463. Online conversion rate increased by 0.5%.

Model

NDCG@10

LambdaMart

0.7310

LambdaDNN

0.7463

5. Multi‑Objective Optimization

Single‑objective models suffer from sample bias, poor interpretability, and data sparsity. A multi‑objective approach jointly learns CTR and CTCVR, mitigating bias, providing probability‑based outputs, and improving generalization.

5.1 Why Multi‑Objective?

Reduces sample selection bias.

Outputs are probabilistic, easier to combine with downstream re‑ranking.

Shared representation learns from abundant CTR data, helping the sparse CTCVR task.

5.2 Model Choice

Google’s 2018 MMOE architecture is adopted: multiple shared experts with task‑specific gating networks, allowing each task to weigh experts differently.

Fig8 MMOE network structure

5.3 MMOE Sample Selection

Only lists containing clicks or orders are kept. For each list, all exposed items are used, plus a set of bottom‑n unexposed hotels as easy negatives. Duplicate clicks within the same list are de‑duplicated.

5.4 Training Issues

Loss imbalance (CTCVR loss much smaller than CTR) is addressed by up‑sampling order samples to achieve a 1:1 positive‑negative ratio. Gate saturation (softmax outputs near 0/1) is mitigated by applying dropout and temperature scaling (λ = feature dimension) to the pre‑softmax scores.

5.5 Results

LambdaDNN NDCG@10 = 0.7477. MMOE variants show similar offline NDCG (0.7405–0.7474) but online CTR improves by 2% while conversion remains stable.

MMOE

No Sampling

Order Upsampling

Gate Scaling

NDCG@10

0.74048

0.74407

0.74738

6. Multi‑Scenario Multi‑Objective Optimization

Search scenarios (same‑city, cross‑city, etc.) previously used separate models, leading to data islands and high maintenance cost. A unified MMOE model with scenario features learns both shared and scenario‑specific patterns.

Fig9 Expert weight distribution across scenarios (expert 2 dominates, indicating low scenario variance). Online order conversion increased by 0.56%.

Feature alignment categories:

Scenario‑independent features (price, click count).

Shared scenario features (POI‑hotel conversion).

Scenario‑specific features (distance in same‑city vs POI distance).

Scenario ID feature.

Fig10 Multi‑scenario multi‑objective network structure

Evaluation of model versions:

Model

Feature Handling

Experts

NDCG@10

v1 – No sample fusion

\

4

0.78416

v2 – Multi‑scenario sample merge

\

4

0.78449

v3 – Increase experts to 8

\

8

0.78566

v4 – Full‑scene feature classification

Feature‑category input

8

0.78745

7. Feature Selection

Motivations include redundancy, dimensionality curse, longer training time, higher deployment cost, and reduced interpretability.

Traditional methods (unsupervised, wrapper, filter, Boruta) suit small‑scale models. For deep models, a Dropout Feature Ranking (DFR) approach based on variational dropout is employed.

Fig11 Dropout before/after training

Variational dropout uses Concrete relaxation to make the Bernoulli mask differentiable, allowing gradient‑based optimization of per‑feature dropout probabilities (p). Adding an L1 regularizer on (1‑p) encourages sparsity.

Fig12 Dropout layer illustration

Experiments:

Adding 10 random noise features reduced NDCG@10 from 0.74956 to 0.74270 (‑0.915%).

Applying DFR and removing 14 low‑importance features increased NDCG@10 to 0.75038 and online conversion by 0.2%.

Model

NDCG@10

Relative Change

v1 – Baseline (310 features)

0.74956

v2 – Baseline + 10 noise (320 features)

0.74270

‑0.915%

v3 – Baseline + DFR (296 features)

0.75038

+0.109%

Feature importance ranking after DFR shows real‑time features (e.g., f_10, f_16, f_1) with scores >0.99, cross features (f_20, f_31) around 0.84–0.77, and noise/redundant features (f_noise, f_0) with scores <0.3.

Feature

Importance (1‑p)

Description

f_10

0.9977

Real‑time features

f_16

0.9973

f_1

0.9956

f_20

0.9915

User‑hotel cross features

f_31

0.8450

f_40

0.7709

f_noise

0.2733

Random Gaussian noise feature

f_150

0.1862

Hotel statistical (redundant) features

...

f_0

0.0001

All‑zero control feature

8. Summary and Future Work

The paper presents Qunar’s hotel search fine‑ranking evolution, demonstrating that deep models (LambdaDNN) outperform tree models, and that multi‑objective and multi‑scenario MMOE architectures further boost both offline ranking metrics and online conversion rates. Future directions include combining pointwise and listwise losses to jointly learn ranking order and classification confidence.

References:

[1] C. J. C. Burges, “From RankNet to LambdaRank to LambdaMART: An Overview,” Microsoft Research, 2010.

[2] R. K. Pasumarthi et al., “TF‑Ranking: Scalable TensorFlow Library for Learning‑to‑Rank,” KDD 2019.

[3] M. Haldar et al., “Applying Deep Learning to Airbnb Search,” KDD 2019.

deep learningRecommendation systemsranking algorithmsfeature selectionlearning to rankmulti-objective
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.