Artificial Intelligence 29 min read

Demystifying Learning to Rank: From Core Algorithms to Scalable Online Sorting Architecture

This article provides a comprehensive, system‑engineer‑focused guide to Learning to Rank, covering fundamental machine‑learning concepts, evaluation metrics such as Precision, nDCG and ERR, training‑testing‑inference stages, pointwise/pairwise/listwise methods, and a detailed multi‑layer online ranking architecture with feature, model and recall governance.

Meituan Technology Team

Dec 20, 2018

Demystifying Learning to Rank: From Core Algorithms to Scalable Online Sorting Architecture

Introduction

Learning to Rank (LTR) is the core technology behind personalized search, recommendation and advertising. It builds a ranking model from data so that the order of items for a new query resembles the order observed in training data.

Machine Learning Basics

In supervised learning a model learns a function f(X) → Y from labeled samples (X, Y). The process consists of:

Training : minimize a loss on the training set.

Testing : evaluate loss on a held‑out set to ensure generalisation.

Inference : given only features X, predict the target Y.

Learning to Rank

LTR applies supervised, semi‑supervised or reinforcement learning to construct ranking models for information‑retrieval systems. Typical use cases are web search, recommendation lists and ad ranking.

List Evaluation Metrics

Three generations of metrics are widely used:

Precision and Recall (P‑R) – binary relevance evaluation.

Discounted Cumulative Gain (DCG) and Normalised DCG (nDCG) – graded relevance with logarithmic position decay.

Expected Reciprocal Rank (ERR) – extends DCG by accounting for the probability that a user stops scanning after seeing a highly relevant document.

Key formulas (shown as images):

List Training Algorithms

Pointwise – predict a real‑valued score for each item (e.g., click‑through‑rate estimation). Simple to label but optimises a surrogate loss.

Pairwise – predict the relative order of a pair of items; loss penalises inversions. LambdaRank/LambdaMart provide efficient gradient approximations.

Listwise – directly optimise list‑level metrics such as nDCG or ERR. Gives the best empirical performance but requires full‑list annotations.

Online Ranking Architecture

The classic two‑phase scheme (Recall → Rough Ranking → Fine Ranking) is extended with domain‑driven layers to address three major challenges.

Feature Challenges

Feature addition, operator definition, normalisation, discretisation, acquisition and governance.

Model Challenges

Completeness of base models, cascade‑model composition, composite objectives, A/B‑test support, hot‑loading.

Recall Challenges

Keyword recall, location‑based (LBS) recall, recommendation recall and rough‑ranking recall.

Recall Governance

Recall is divided into four categories:

Keyword recall – implemented with Elasticsearch .

Distance recall – implemented with a K‑D tree .

Rough‑ranking recall – coarse‑ranking models that filter large candidate sets.

Recommendation recall – collaborative‑filtering based retrieval.

Traffic is allocated to each recall bucket to guarantee stable A/B testing.

Feature Service Governance

Features are classified into four groups to simplify service design:

List features – returned as in‑memory lists to avoid request explosion.

Entity features – served via Redis/Tair key‑value stores.

Context features – embedded directly in the request (e.g., scene, city).

Similarity features – computed by dedicated in‑memory services to offload heavy similarity calculations.

Layered Online Ranking Model

The end‑to‑end pipeline consists of six logical steps:

Scene Dispatch – route the request according to business type (platform, list, usage scenario).

Traffic Distribution – bucket‑based allocation of traffic to experimental models.

Recall – retrieve a candidate set using the four recall mechanisms.

Feature Retrieval – fetch all required features according to the four feature groups.

Prediction – run the candidate features through a cascade of models.

Ranking – order the candidates by the predicted scores.

Feature Pipeline

The feature pipeline transforms raw feature names into values ready for the model.

Feature Model – metadata that lists all atomic feature names required by a ranking model.

Expression – a Polish‑notation string that defines composite features. Example: $O+_O+_Vv1_C14.2_O*_C2_O+_Vv2_Vv3 where V denotes a variable, C a constant, O an operator and $ marks a composite feature.

Atomic Feature – the basic (name, value) pair, stored in POJOs such as DealInfo.

Feature Proxy – a thin wrapper that forwards a request for a specific feature group to the appropriate remote feature service.

Feature Service – the actual storage layer (in‑memory list service, Redis/Tair KV store, or dedicated similarity service).

Prediction Pipeline

The prediction pipeline combines several model types and transformations.

Prediction – wraps a model and converts a candidate’s feature vector into the model’s input format.

Cascade Model – stacks heterogeneous models (e.g., GBDT for ID features, NN for dense embeddings, LR for linear terms). The final score is computed by an expression such as Score = α·NNScore + β·LRScore / (1+γ), where α, β, γ are pre‑tuned constants.

Expression – same Polish‑notation language used in the feature pipeline to combine sub‑model outputs.

Transform – feature‑level transformations (normalisation, discretisation, embedding lookup).

Scoring – the actual inference step of each atomic model (tree, linear, neural network).

Atomic Model – the smallest executable model unit (Logistic Regression, GBDT, MLP, etc.).

Model Distribution and Traffic Bucketing

Online traffic is split into N buckets (e.g., 32). Each request is hashed to a bucket; each experimental model is assigned a quota of buckets proportional to its desired traffic share. Buckets are allocated orthogonally for different traffic dimensions (user, device, etc.) to guarantee independent A/B tests.

Conclusion

The article presents a concrete, domain‑driven online ranking architecture used in Meituan‑Dianping’s on‑site dining personalization. It extends the classic Two‑Phase Scheme with fine‑grained layers for scene dispatch, traffic distribution, recall governance, feature service governance and cascade prediction, while supporting robust A/B testing, hot‑loading and composite objectives.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning feature engineering Domain-Driven Design A/B testing Evaluation Metrics Learning-to-Rank Online Ranking Architecture

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.