How DNN Breaks Feature Scaling Limits in Search Ranking

This article examines the challenges of high‑dimensional sparse features in search ranking, explains why traditional linear models struggle, and describes how deep neural networks with novel encoding schemes and online updates can dramatically improve CTR prediction and real‑time performance.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How DNN Breaks Feature Scaling Limits in Search Ranking

Background

Search ranking heavily relies on models such as LR, GBDT, and SVM, which face severe scalability limits when dealing with billions of item features; even Alibaba's internal PS‑LR supporting 50 billion features is insufficient for the current scale.

Related Work

Prior research like "Deep Learning over Multi‑field Categorical Data" introduced ID‑based features for CTR estimation, while Google’s Wide & Deep model and FNN added handcrafted features to capture useful patterns.

Our Deep Learning Model

We built a large‑scale DNN for conversion‑rate prediction in search, handling tens of billions of sparse ID features across user, item, and query domains. The network combines sparse and dense representations, adds item‑ID features and real‑time statistics in the final layer, achieving both generalization and real‑time adaptability.

Wide Model

Features include raw ID features (item_id, seller_id) and cross features (user_id × item_id, user_id × seller_id). Continuous statistical features are processed in a separate two‑layer network with tanh/sigmoid activations to preserve their impact.

Deep Model

Discrete IDs are embedded into dense vectors, fed into a shallow DNN (1‑2 hidden layers) for efficiency. The overall architecture uses three fully‑connected layers for sparse+dense feature learning, followed by two layers for binary classification (click/purchase).

Random Encoding

One‑hot vectors of dimension N are compressed into six‑hot vectors with limited overlap, achieving roughly 20× compression while preserving discriminative power.

Hanging Encoding

For cold‑start items, we share one dimension with a similar hot item (found via i2i) and random‑encode the remaining dimensions, allowing cold items to benefit from the representation of popular items.

Tokenization Encoding

Query phrases are tokenized, each token one‑hot encoded, and then merged to form variable‑length query vectors, enabling the model to capture shared word patterns across different queries.

Online Update

During high‑traffic events like Double‑11, we perform online training on the wide part using FTRL for sparse features and on the deep part using SGD with a small learning rate. Techniques such as pairwise sampling and mini‑batch updates address streaming data imbalance and asynchronous SGD instability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Deep LearningCTR predictionsearch rankingOnline LearningDNNfeature encoding
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.