Artificial Intelligence 14 min read

Model Compression and Feature Optimization for Large-Scale CTR Prediction in Advertising

Alibaba‑Mama’s advertising team shrank multi‑terabyte CTR models to just tens of gigabytes by applying row‑dimension embedding compression, multi‑hash embeddings, graph‑based relationship networks, PCF‑GNN pre‑training, and droprank feature selection, preserving accuracy while halving training time, doubling online QPS, and retiring hundreds of servers.

Alimama Tech
Alimama Tech
Alimama Tech
Model Compression and Feature Optimization for Large-Scale CTR Prediction in Advertising

With the emergence of billion‑parameter language models such as GPT‑3, the “brute‑force” approach of scaling model size has also become the dominant paradigm for click‑through‑rate (CTR) prediction in search, recommendation, and advertising. However, such massive models impose huge storage and compute demands, which are increasingly unsustainable as available compute resources plateau.

The Alibaba‑Mama advertising team systematically reduced the size of their CTR models while preserving prediction accuracy. The original models, which occupied several terabytes, were compressed to a few dozen gigabytes, achieving a “small‑but‑powerful” solution.

Two major optimization paths are identified: feature optimization and model‑structure optimization. Feature optimization includes enriching multimodal features, upgrading high‑order features, and introducing dynamic features. Model‑structure optimization involves Transformer‑based sequence modeling and graph‑neural‑network (GNN)‑based architectures. Over years of hardware growth, the CTR models grew both wider and deeper, eventually reaching multi‑terabyte scales.

To make further progress under limited resources, the team focused on compressing the embedding layer, which stores the majority of parameters. Three compression directions are explored:

Row‑dimension (feature‑space) reduction

Column‑dimension (embedding‑vector) reduction

Value‑precision quantization (e.g., FP16/Int8)

The article concentrates on row‑dimension compression, which can be performed online during training and yields orders‑of‑magnitude reduction without loss of accuracy.

Key practical techniques applied in the production environment include:

Relationship Network : Replaces implicit ID‑type cross features with a graph‑based network that models feature interactions using a self‑attention‑like mechanism.

Graph‑based Pre‑training (PCF‑GNN) : Represents features as nodes and interaction statistics as edges, learning explicit cross‑semantic embeddings via edge‑weight prediction.

Multi‑Hash Embedding : Uses multiple hash functions to compress core ID features, achieving near‑collision‑free embeddings while keeping model size in the tens of gigabytes.

Droprank Feature Selection : Integrates dropout‑based feature ranking into model training, enabling simultaneous optimization of feature selection and model performance. An enhanced version (FSCD) incorporates system‑resource priors for balanced efficiency and effectiveness.

Combined with column‑dimension upgrades, heterogeneous‑compute optimizations, and incremental feature pipelines, the CTR model size was reduced from terabyte‑level to dozens of gigabytes. This resulted in a 50% reduction in training time, a 100% increase in online QPS, and the decommissioning of hundreds of machines.

In summary, the systematic engineering practice demonstrates that “small‑and‑beautiful” models are feasible for large‑scale advertising CTR prediction, provided that resource‑aware feature and model compression strategies are employed.

advertisingCTR predictionmodel compressionGraph Neural Networksembedding reductionfeature selectionLarge-scale ML
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.