Artificial Intelligence 12 min read

A Practical Survey of Common CTR Prediction Models

This article reviews several widely used click‑through‑rate (CTR) prediction models—including Logistic Regression, XGBoost, Factorization Machines, Wide & Deep, DeepFM, DCN, xDeepFM, and AFM—providing their principles, advantages, disadvantages, and links to TensorFlow implementations for quick reuse and deeper understanding.

DataFunSummit

May 10, 2022

A Practical Survey of Common CTR Prediction Models

The purpose of this article is to organize practical and commonly used estimation models so that they can be quickly reused for validating ideas at work and to help further understand the principles and implementations of these fundamental algorithmic models. The models covered include:

Logistic Regression (LR)

XGBoost

Factorization Machines (FM)

Wide & Deep (WDL)

DeepFM

Deep & Cross Network (DCN)

xDeepFM

Attentional Factorization Machines (AFM)

All TensorFlow code implementations, including sample data and runnable scripts, are available on GitHub: https://github.com/fly-adser/tensorflow-CTR

Logistic Regression (LR)

LR transforms the output of a linear function into class probabilities using the logistic (sigmoid) function. Its loss function is the cross‑entropy loss, and parameters are learned via gradient descent.

XGBoost

The derivation is omitted here; see references [1] and [2]. The algorithm’s advantages include explicit regularization in the objective, use of second‑order gradients for faster convergence, and a sparsity‑aware split algorithm that handles missing values efficiently.

Algorithmic optimization points 1) Regularization terms control model complexity by smoothing leaf counts and weights. 2) Second‑order gradients enable custom loss functions and faster, more accurate convergence. 3) Sparse‑aware split algorithm evaluates missing values on both sides and selects the direction with the highest gain.

Engineering optimization points 1) Weighted histogram algorithm for finding optimal split points, using weighted second‑order gradients as sample weights. 2) Parallel block structure that stores sorted feature columns with pointers to gradients, allowing parallel split search across features and distributed storage. 3) Other optimizations such as cache‑aware access and off‑CPU computation.

Disadvantages 1) Difficult to adapt to online learning because tree construction is non‑differentiable. 2) Limited handling of extremely sparse categorical features. 3) Primarily suited for structured data, not for unstructured inputs like images.

Factorization Machines (FM)

FM addresses XGBoost’s weakness on sparse data. A second‑order FM model is defined as:

The parameters to estimate include global bias, linear weights for each feature, and interaction vectors. The interaction term is modeled via inner products of embedding vectors, enabling the model to capture all pairwise feature interactions even when no explicit co‑occurrence exists in the training data.

Wide & Deep (WDL)

WDL combines a linear model (wide) with a deep neural network (deep) to balance memorization and generalization. The wide part is a generalized linear model using raw and crossed features; the deep part embeds high‑dimensional sparse categorical features and feeds them into a feed‑forward network with ReLU activations. Optimizers: FTRL with L1 regularization for the wide part, AdaGrad for the deep part. Continuous features are transformed via cumulative distribution functions.

DeepFM

DeepFM improves WDL by (1) replacing the wide component with an FM layer and (2) sharing the same input embeddings between the FM (wide) and deep components. This allows the model to learn both low‑order and high‑order feature interactions directly from raw features without extensive manual feature engineering.

Deep & Cross Network (DCN)

DCN replaces the wide part of WDL with a cross network that explicitly models feature crosses without manual engineering. Each cross layer computes: x_{l+1} = x_0 * (w_l^T x_l) + b_l + x_l, where x_0 is the original input. This formulation captures bounded‑degree feature interactions while keeping parameter count low.

xDeepFM

xDeepFM introduces a Compressed Interaction Network (CIN) in the wide part to capture explicit high‑order feature interactions while preserving the original embedding dimensions. The CIN takes the concatenated feature matrix as input, performs outer‑product‑like interactions across layers, aggregates via weighted sums, and outputs a vector that is concatenated with the deep and linear parts for final prediction.

Attentional Factorization Machines (AFM)

AFM extends FM by assigning different importance weights to each pairwise feature interaction via an attention network. After computing the second‑order interaction vectors, an attention mechanism produces scalar weights for each interaction, which are then summed and passed through a fully connected layer to obtain the final prediction score.

References

[1] Detailed explanation of Logistic Regression. [2] XGBoost: A Scalable Tree Boosting System. [3] Advertising mechanism – Model chapter: XGBoost. [4] Factorization Machines. [5] Advertising mechanism – Model chapter: CTR prediction with Wide & Deep. [6] Wide & Deep Learning for Recommender Systems. [7] Understanding Google’s Wide & Deep model. [8] DeepFM: A Factorization‑Machine based Neural Network for CTR Prediction. [9] xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. [10] xDeepFM – intuitive explanation and code practice. [11] Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks. [12] Deep & Cross Network for Ad Click Predictions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CTR TensorFlow Model Survey

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.