Artificial Intelligence 32 min read

Demystifying Learning to Rank: From Core Concepts to Scalable Online Architecture

This article offers a comprehensive, system‑engineer‑focused guide to Learning to Rank, covering fundamental machine‑learning concepts, evaluation metrics, training approaches, and a detailed online ranking architecture with feature, recall, and model governance, illustrated by real‑world examples from Meituan‑Dianping.

21CTO

Dec 25, 2018

Demystifying Learning to Rank: From Core Concepts to Scalable Online Architecture

Search, recommendation, and advertising rely on Learning to Rank, a core AI technology. This article provides a systematic, easy‑to‑understand explanation from a system development engineer’s perspective.

Introduction

We live in an era of knowledge explosion, with massive information growth and rapid AI development, leading internet companies to demand highly personalized and intelligent information display. Typical personalized applications include search lists, recommendation lists, and ad displays.

Many are unaware that behind seemingly simple personalized displays lies a large amount of data, algorithms, and engineering architecture, which can deter most internet companies. The fundamental technology is the Learning to Rank problem. Existing articles are either algorithm‑focused or engineering‑focused; algorithm articles are mathematically heavy, while engineering articles often stop at Google’s Two‑Phase Scheme and lack concrete implementation details.

This article uses simple examples and analogies to explain the algorithmic part for system engineers, while the architecture part describes Meituan‑Dianping’s online ranking system for in‑store dining, serving as a reference prototype. The architecture addresses service governance, layered design, and provides practical solutions such as traffic bucketing, traffic grading, feature models, and cascade models.

The goal is to help developers grasp core ranking algorithms and provide fine‑grained reference architecture for online implementation.

Algorithm Section

Machine learning involves optimization theory, statistics, and numerical computation, posing barriers for system engineers. This section uses simple analogies to reveal core concepts of machine learning and Learning to Rank.

Machine Learning

What is Machine Learning?

Typical machine‑learning problem:

A machine‑learning model (Model/Algorithm) predicts a target (Prediction/Target) based on observed features (Feature). It is essentially a function mapping X (features) to Y (prediction). The core problem is to obtain a predictive function from data.

Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to learn with data, without being explicitly programmed.

The essence of machine learning is learning from data to obtain a predictive function, analogous to how humans learn from observations such as shadows or river levels.

Two key questions arise:

What exactly is “intelligence”?

How can machines possess intelligence?

What is Intelligence?

Traditional programming follows three stages: data → knowledge → rule → program. If a problem is covered by rules, the program can handle it; otherwise humans must create new rules. Thus “intelligence” requires the ability to generalize (“one‑to‑many”).

How to Enable Machine Intelligence?

Inspired by human learning, machine learning adopts three stages: Training, Testing, and Inference.

Training Phase

Humans provide training samples (X, Y) where X are features and Y are targets, analogous to teachers giving students problems and answers. The model attempts to minimize loss, similar to students minimizing answer differences.

Testing Phase

Trained models receive unseen test samples (X, Y) analogous to exam papers. The model makes predictions, and the total loss must stay below a predefined threshold.

Inference Phase

During inference the model receives only features X and must output predictions.

Learning to Rank

What is Learning to Rank?

Learning to rank is the application of machine learning, typically supervised, semi‑supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g., “relevant” or “not relevant”) for each item. The ranking model's purpose is to rank, i.e., produce a permutation of items in new, unseen lists in a way which is “similar” to rankings in the training data in some sense.

Learning to Rank is the application of machine learning to information‑retrieval systems, aiming to build models that order items such as search results, recommendation lists, or ads.

List‑ranking objectives involve structured targets, requiring two key concepts: list evaluation metrics and list training algorithms.

List Evaluation Metrics

Using keyword search as an example, challenges include defining relevance between articles and keywords, and scoring an entire list when some items are mis‑ordered.

Metrics have evolved through three stages: Precision & Recall, Discounted Cumulative Gain (DCG), and Expected Reciprocal Rank (ERR).

Precision and Recall (P‑R)

Precision and Recall jointly assess ranking quality. Example: 200 relevant articles, algorithm predicts 100 as relevant, of which 80 are truly relevant → Precision = 0.8, Recall = 0.4.

Discounted Cumulative Gain (DCG)

DCG addresses two P‑R drawbacks: binary relevance and ignoring position. It assigns multiple relevance levels (rel1, rel2, …) and applies logarithmic decay based on position. The DCG for the top‑p documents is defined as Σ (2^{rel_i}‑1) / log2(i+1). Normalized DCG (nDCG) compares DCG to an ideal DCG (IDCG), yielding a value between 0 and 1.

Expected Reciprocal Rank (ERR)

ERR further considers the relevance of preceding documents. If a highly relevant document appears later but earlier documents already satisfy the user, its contribution diminishes.

List Training Algorithms

Common approaches are Pointwise, Pairwise, and Listwise.

Pointwise treats each document independently, suitable for regression or classification (e.g., CTR prediction).

Pairwise compares pairs of documents, optimizing the number of inversions (binary classification).

Listwise directly optimizes list‑level metrics such as nDCG or ERR.

Online Ranking Architecture

Information retrieval consists of an indexing phase and a query phase.

The indexing phase builds an index from documents. The query phase performs recall, coarse ranking (Top‑n Retriever), and fine ranking (Reranker), following Google’s Two‑Phase Scheme.

Three Major Challenges

Feature challenges: addition, operators, normalization, discretization, acquisition, governance.

Model challenges: completeness, cascade models, composite objectives, A/B testing support, hot‑loading.

Recall challenges: keyword recall, LBS recall, recommendation recall, coarse‑ranking recall.

These challenges are interrelated; a domain‑driven design (DDD) with clear boundaries and continuous integration is recommended.

Recall Governance

The classic Two‑Phase Scheme includes recall, coarse ranking, and fine ranking. For Meituan‑Dianping, recall also involves location‑based (K‑D tree) and recommendation‑based methods.

Recall types:

Keyword recall using Elasticsearch.

Distance recall using K‑D tree.

Coarse‑ranking recall.

Recommendation recall.

Feature Service Governance

Features are categorized into list‑type, entity‑type, context‑type, and similarity‑type, each with dedicated service solutions (in‑memory list service, Redis/Tair KV service, in‑request context, or separate computation service).

Online Ranking Layered Model

The pipeline consists of Scene Dispatch, Traffic Distribution, Recall, Feature Retrieval, Prediction, and Ranking.

Scene Dispatch

Routes requests based on business type (platform, list, usage scenario).

Model Distribution

Allocates online traffic to experimental models, supporting A/B testing and ensuring orthogonal traffic with other layers.

Traffic Bucketing Principle

Divide traffic into N buckets.

Hash each traffic unit into a bucket.

Assign each model a quota of buckets.

Total quotas sum to 100%.

When traffic and model fall into the same bucket, the model receives that traffic.

Example: 32 buckets, models A (37.5%), B (25%), C (37.5%) receive 12, 8, and 12 buckets respectively.

Traffic Grading

Baseline traffic for comparison.

Experimental traffic for new models.

Potential traffic for promising experiments.

Main traffic for the best‑performing model.

Ranking Module

Fetch all list entities’ features.

Pass features to the prediction module.

Sort entities by predicted scores.

Feature Pipeline

Handles feature models, expressions, atomic features, and feature proxies.

Expressions use Polish notation with prefixes ($ for composite feature, O for operator, C for constant, V for variable) and underscore as delimiter.

Feature Model

Encapsulates all information needed for feature acquisition and operators.

Feature Proxy

Routes feature requests to appropriate remote services, reducing network cost by fetching only needed feature values.

Prediction Pipeline

Comprises Prediction, Cascade Model, Expression, Transform, Scoring, and Atomic Model.

Prediction

Wraps the model, converting entity features into the model’s input format.

Cascade Model

Combines tree‑based (e.g., XGBoost) and linear models (e.g., LR) or Wide&Deep structures, and supports multi‑objective predictions via expressions.

Atomic Model

Fundamental model types such as linear, tree, or neural networks, implemented as independent prediction units.

Conclusion

The article summarizes Meituan‑Dianping’s experience in personalized in‑store dining information display, covering both algorithmic concepts and a detailed ranking architecture. The proposed layered model, with clear domain boundaries and continuous integration, offers a practical reference for building robust online ranking systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

feature engineering Model Deployment A/B testing Learning-to-Rank online ranking

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.