COLD: A Next‑Generation Pre‑Ranking System for Online Advertising
The article introduces COLD, a computing‑power‑aware online and lightweight deep pre‑ranking system for Alibaba's targeted ads, detailing its evolution from static CTR models to vector‑inner‑product models, its flexible network architecture with feature‑selection via SE blocks, engineering optimizations such as parallelism, column‑wise computation, Float16 and MPS, and demonstrates superior offline and online performance through extensive experiments.
In large‑scale ranking scenarios such as search, recommendation, and advertising, cascade ranking architectures are widely used. The coarse‑ranking stage sits between recall and fine‑ranking, selecting a few hundred candidates from tens of thousands of ads within a strict 10‑20 ms latency budget.
The development of coarse‑ranking models has progressed through three generations: (1) static quality‑score based on historical CTR, (2) early machine‑learning models such as logistic regression, and (3) the current dominant vector‑inner‑product deep models with a dual‑tower architecture that compute user and ad embeddings separately.
Although vector‑inner‑product models improve expressive power, they still suffer from limited feature interaction, delayed model updates, and poor real‑time responsiveness. To address these issues, the authors propose COLD (Computing‑power‑cost‑aware Online and Lightweight Deep pre‑ranking system), which treats computing power as a variable and allows any deep model architecture. The initial implementation uses a Group‑wise Embedding Network (GwEN) built on concatenated feature embeddings followed by several fully‑connected layers.
Network Structure : Feature importance is estimated using a Squeeze‑and‑Excitation (SE) block. Each feature embedding \(e_i\) is compressed to a scalar \(s_i\) via a fully‑connected layer and sigmoid activation, forming a vector \(s\). The scalar \(s_i\) re‑weights the corresponding embedding \(e_i\). Features are then ranked by importance, and a subset \(K\) is selected based on offline metrics (GAUC, QPS, RT) to balance effectiveness and latency.
Engineering Optimizations :
Parallelism: independent ad computations are split into multiple parallel requests; feature computation uses multithreading, network computation runs on GPU.
Column‑wise computation: feature matrices are reorganized from row‑wise to column‑wise to enable contiguous memory access and SIMD‑accelerated operators.
Float16 acceleration: most matrix multiplications run in Float16, while batch‑norm layers stay in Float32; a custom linear‑log activation mitigates precision loss.
Multi‑Process Service (MPS): reduces kernel launch overhead, yielding nearly a 2× QPS boost.
Online Service Architecture : COLD supports real‑time training and inference, enabling rapid adaptation to data distribution shifts and better cold‑start handling for new ads. Online learning further improves responsiveness compared to static vector‑inner‑product models.
Experimental Results :
Model effectiveness: COLD outperforms vector‑inner‑product baselines in GAUC and recall; online CTR improves by 6.1% (daily) and 9.1% (Double 11), with RPM gains of 6.5% and 10.8% respectively.
System performance: COLD achieves a balanced trade‑off between the high throughput of vector models and the high latency of deep fine‑ranking models.
Feature selection tables demonstrate the impact of different feature groups on both accuracy and latency.
Float16 and MPS optimizations double QPS without sacrificing model quality.
Since 2019, COLD has been deployed across major Alibaba targeted‑advertising services, delivering significant online performance improvements.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.