Artificial Intelligence 13 min read

How Alibaba’s Mixed Logistic Regression Revolutionizes CTR Prediction

This article explains the technical background of click‑through‑rate (CTR) prediction, critiques traditional linear models, introduces Alibaba’s Mixed Logistic Regression (MLR) algorithm with its advanced features and large‑scale distributed implementation, and reviews its successful deployment and remaining challenges in advertising systems.

Alibaba Cloud Developer

Jun 15, 2017

How Alibaba’s Mixed Logistic Regression Revolutionizes CTR Prediction

1. Technical Background

CTR (Click‑Through‑Rate) is a key metric in online advertising, representing the ratio of clicks to impressions. Accurate CTR prediction is critical for revenue and is a core algorithmic problem for major platforms such as Google and Facebook.

2. Current CTR Prediction Algorithms and Progress

2.1 Traditional Algorithms and Limitations

Industry‑standard solutions rely on logistic regression (LR) combined with extensive manual feature engineering. While LR scales well, its linear nature limits learning capacity, requiring costly domain knowledge to design feature crosses and resulting in poor transferability.

2.2 Alibaba’s MLR Algorithm

In 2011‑2012, Alibaba’s advertising team introduced Mixed Logistic Regression (MLR), a natural extension of LR that learns non‑linear relationships directly in the original feature space using a divide‑and‑conquer piecewise linear approach. The model partitions the space into m segments, each fitted by a linear model, balancing fitting power and generalization.

The hyper‑parameter m controls the trade‑off: m=1 reduces to ordinary LR, larger m increases fitting ability but also model size and data requirements. In practice, m is set to 12, and a 4‑segment example demonstrates perfect fitting of a diamond‑shaped decision boundary.

MLR offers two main advantages for industrial‑scale CTR prediction:

End‑to‑end non‑linear learning : automatically discovers non‑linear patterns without manual feature crosses, simplifying training and improving transferability.

Sparsity : L1 and L2,1 regularization yields highly sparse models, enhancing training speed and online inference while posing new optimization challenges.

2.3 Advanced Features

Structural priors – leverage domain knowledge to assign different feature groups to specific partitions, e.g., user‑side features for partitioning and ad‑side features for linear fitting.

Linear bias – explicitly models bias features such as position and slot, delivering up to a 4% RPM lift.

Model cascading – combines MLR with a traditional LR in a wide‑and‑deep style, allowing strong feature sets to be cascaded for better convergence.

Incremental training – pre‑train with structural priors then fine‑tune on the full space, reducing training steps and achieving an additional 3% RPM gain.

2.4 Large‑Scale Distributed Implementation

MLR targets industrial data sizes (billions of features, hundreds of billions of parameters, trillions of samples). A custom distributed architecture places both worker and server roles on each node, fully utilizing CPU and memory to maximize resource efficiency.

The "common feature" trick compresses static user attributes (e.g., age, gender) that repeat across many impressions, storing them once and indexing subsequent samples. This reduces resource consumption to one‑third while delivering a 12× speedup.

3. Applications in Alibaba’s Advertising Business

Since 2013, MLR has been deployed across multiple Alibaba business units (precision‑targeted ads, Taobao Affiliate, Shenma ads, Taobao search, etc.), delivering over 20% improvements in CTR and RPM.

3.1 CTR Prediction for Targeted Ads

Features include user IDs, profile, historical behavior, ad IDs, campaign IDs, and contextual signals (time, position). The feature space reaches ~200 million dimensions. MLR, combined with structural priors, pre‑training, incremental training, and linear bias, outperforms traditional LR + feature engineering in accuracy and scalability.

3.2 Learning‑to‑Match for Targeted Ads

The MLR‑based matching framework learns personalized user interests from behavior history without extensive feature crossing, improving candidate ad recall and simplifying system design.

4. Summary and Challenges

MLR provides automatic feature learning, high sparsity, and a scalable distributed training pipeline, but challenges remain: initialization, non‑convex local optima, convergence speed, and further enhancing the model’s abstraction capability for even larger scale data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Advertising machine learning CTR Prediction Mixed Logistic Regression MLR large-scale systems

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.