CTR Estimation in Recommendation Systems: From Logistic Regression to Deep & Cross Networks
This article reviews the evolution of click‑through‑rate (CTR) estimation models for recommendation ranking, covering logistic regression, feature‑engineering tricks, factorization machines, deep neural networks, wide‑and‑deep architectures, and the Deep & Cross Network, while discussing their strengths, limitations, and future research directions.
In the ranking stage of recommendation systems, click‑through‑rate (CTR) estimation is a core technique for ordering candidate items. Because real‑world data are extremely sparse, extracting effective, generalized features from massive sparse inputs remains a major challenge.
The article first outlines the historical evolution of CTR models, starting with Logistic Regression (LR). LR can incorporate three types of features: user features (e.g., interests, age, gender), context features (e.g., device, network), and item features (e.g., category, tags). However, LR lacks the ability to generalize across unseen feature combinations, so practitioners manually create interaction features such as:
用户喜欢科比且待排新闻为体育类 which should receive a higher weight than 用户喜欢科比且待排新闻为娱乐类 in the LR model.
To improve generalization, Factorization Machines (FM) introduce a latent vector v for each feature and compute pairwise interactions via inner products, enabling the model to infer relationships between features that never co‑occurred in training data.
With the rise of deep learning, Deep Neural Networks (DNN) are applied to CTR prediction. Because raw sparse features are high‑dimensional, an embedding layer first maps them to low‑dimensional dense vectors before feeding them to fully‑connected layers. The Wide & Deep model combines a LR‑style wide component (hand‑crafted cross features) with a deep component, achieving both low‑order and high‑order feature interactions.
To further reduce manual feature engineering, the Deep & Cross Network (DCN) adds a Cross Network module on top of the deep part. The Cross Network repeatedly applies the operation x_{l+1} = x_0 * (w_l^T x_l) + b_l + x_l , where x_0 is the original input vector, w_l and b_l are learnable parameters, and * denotes element‑wise multiplication. This design yields high‑order cross features with linear computational complexity.
The article compares DCN with FM, noting that both achieve feature crossing but via different mechanisms: FM uses vector‑wise inner products of latent vectors, while DCN performs bit‑wise element‑wise products followed by a projection back to the original dimension. DCN can generate multi‑layer cross features without the quadratic cost of higher‑order FM.
Finally, the article discusses current limitations of DCN, such as indiscriminate bit‑wise crossing of embedding dimensions belonging to the same field, which can produce ineffective interactions. It suggests that future work should focus on field‑aware (vector‑wise) crossing, citing recent advances like xDeepFM.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.