Artificial Intelligence 20 min read

Advances in Pre‑Ranking for Large‑Scale Advertising: The COLD Framework and Its Technical Evolution

This article reviews the development history, technical routes, and recent breakthroughs of pre‑ranking (coarse ranking) in large‑scale advertising systems, focusing on Alibaba's COLD (Computing‑power‑cost‑aware Online and Lightweight Deep) framework, its model design, engineering optimizations, experimental results, and future research directions.

DataFunTalk
DataFunTalk
DataFunTalk
Advances in Pre‑Ranking for Large‑Scale Advertising: The COLD Framework and Its Technical Evolution

In large‑scale ranking scenarios such as search, recommendation, and advertising, a cascade ranking architecture—recall → coarse ranking → fine ranking → re‑ranking—is widely used. Coarse ranking sits between recall and fine ranking, selecting a few hundred candidate ads from millions while meeting strict latency (10‑20 ms) and computational constraints.

The article first outlines the background of coarse ranking, defining it as the intermediate stage that must balance compute, latency, and the need to provide a candidate set that satisfies downstream objectives.

Two major technical routes for coarse ranking:

Set‑selection methods, which model the candidate set as a whole (e.g., multi‑channel, listwise approaches such as LambdaMART, and sequence generation methods).

Precise value estimation, a pointwise approach that directly predicts the final system objective (e.g., eCPM). The formula is expressed as ECPM = pCTR * bid .

The article then describes the evolution of coarse‑ranking technologies in industry, from static quality scores and early LR models to deep vector‑inner‑product models (dual‑tower) and finally to the fourth‑generation COLD system introduced in 2019.

COLD framework: COLD treats computing power as a variable and jointly optimizes model performance and resource consumption. It uses a Group‑wise Embedding Network (GwEN) with SE‑blocks for feature selection, structured pruning via scaling factors, and mixed‑precision training (Float16 with Float32 for batch‑norm). The pruning process multiplies each neuron output by a learnable γ and sparsifies the model when γ becomes zero.

Engineering optimizations include parallel request handling, column‑wise sparse matrix computation, SIMD‑accelerated feature operators, Float16 acceleration, and Multi‑Process Service (MPS) to reduce kernel launch overhead. These techniques together achieve up to a 2× increase in QPS while maintaining or improving model quality.

Experimental results show that COLD outperforms traditional vector‑inner‑product models in GAUC and recall, and delivers significant online gains (CTR +6.1 % / RPM +6.5 % in daily traffic, and larger lifts during peak events). System‑level benchmarks indicate that COLD balances latency and throughput between the fast but less expressive vector models and the slower but more accurate fine‑ranking models.

Future directions discussed include tighter integration of coarse and fine ranking through joint training and knowledge distillation, as well as a possible shift back to set‑selection‑oriented modeling that directly optimizes the candidate set rather than a fully ordered list.

References: Covington et al., "Deep Neural Networks for YouTube Recommendations" (RecSys 2016); Wang et al., "COLD: Towards the Next Generation of Pre‑Ranking System".

advertisingmachine learningsystem optimizationonline learningpre‑rankingCOLDranking systems
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.