How DNN Breaks Feature Scaling Limits in Search Ranking
This article examines the challenges of high‑dimensional sparse features in search ranking, explains why traditional linear models struggle, and describes how deep neural networks with novel encoding schemes and online updates can dramatically improve CTR prediction and real‑time performance.
Background
Search ranking heavily relies on models such as LR, GBDT, and SVM, which face severe scalability limits when dealing with billions of item features; even Alibaba's internal PS‑LR supporting 50 billion features is insufficient for the current scale.
Related Work
Prior research like "Deep Learning over Multi‑field Categorical Data" introduced ID‑based features for CTR estimation, while Google’s Wide & Deep model and FNN added handcrafted features to capture useful patterns.
Our Deep Learning Model
We built a large‑scale DNN for conversion‑rate prediction in search, handling tens of billions of sparse ID features across user, item, and query domains. The network combines sparse and dense representations, adds item‑ID features and real‑time statistics in the final layer, achieving both generalization and real‑time adaptability.
Wide Model
Features include raw ID features (item_id, seller_id) and cross features (user_id × item_id, user_id × seller_id). Continuous statistical features are processed in a separate two‑layer network with tanh/sigmoid activations to preserve their impact.
Deep Model
Discrete IDs are embedded into dense vectors, fed into a shallow DNN (1‑2 hidden layers) for efficiency. The overall architecture uses three fully‑connected layers for sparse+dense feature learning, followed by two layers for binary classification (click/purchase).
Random Encoding
One‑hot vectors of dimension N are compressed into six‑hot vectors with limited overlap, achieving roughly 20× compression while preserving discriminative power.
Hanging Encoding
For cold‑start items, we share one dimension with a similar hot item (found via i2i) and random‑encode the remaining dimensions, allowing cold items to benefit from the representation of popular items.
Tokenization Encoding
Query phrases are tokenized, each token one‑hot encoded, and then merged to form variable‑length query vectors, enabling the model to capture shared word patterns across different queries.
Online Update
During high‑traffic events like Double‑11, we perform online training on the wide part using FTRL for sparse features and on the deep part using SGD with a small learning rate. Techniques such as pairwise sampling and mini‑batch updates address streaming data imbalance and asynchronous SGD instability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
