How Wide‑ResNet with Batch Norm Boosts 1688’s ‘You May Like’
This article introduces the Wide&Deep, PNN, DeepFM, and a novel Wide‑ResNet model applied to Alibaba’s 1688 “You May Like” recommendation, describes the system architecture, training data, experimental results showing AUC improvements with batch normalization, and shares practical tuning insights.
Background
“You may like” is a classic recommendation scenario on 1688. The mobile homepage shows about 230 k exposures per day, with roughly 72 % of users clicking and an average of eight clicks per user. The goal is to predict the most interesting items for each user to increase click‑through and purchase rates.
System Architecture
Real‑time user behavior is stored in ABFS (Ali Basic Feature Server) and passed to TPP (The Personalization Platform). BE (Basic Engine) performs vector‑based recall of 1 000 candidates, which are then scored online by RTP (Real‑Time Prediction). The top 600 items are displayed. Offline model development and tuning are done on the Porsche distributed streaming platform.
Model Overview
1. Wide&Deep
The classic Google Wide&Deep model combines a linear LR “wide” part that memorizes feature co‑occurrences with a deep neural network that generalizes to unseen combinations. Sparse features are embedded and fed to the DNN, while crossed dense features are fed to the LR.
2. Product‑based Neural Network (PNN)
PNN adds an explicit product layer to capture second‑order feature interactions, using either inner or outer product on embeddings of equal dimension.
3. DeepFM
DeepFM replaces the wide LR part with a factorization‑machine (FM) layer, sharing embeddings with the DNN. FM captures low‑order interactions, while the DNN captures high‑order ones.
4. Wide‑ResNet (proposed)
The author replaces the DNN in Wide&Deep with a ResNet‑style network that uses skip connections. Adding batch normalization (BN) after each residual block significantly improves performance.
Training Data
Training samples are collected from the seven days preceding the target date. Each impression is labeled 1 if clicked, otherwise 0. Features include user‑level signals (e.g., whether the user is a Taobao seller) and item‑level signals; B‑type users differ from C‑type users in demographics and brand features.
Experimental Results
Offline experiments on the Porsche platform show that Wide‑ResNet with BN achieves about 1 % higher AUC than the baseline Wide&Deep on both training and test sets. Incremental training over three data batches further raises AUC by 5‑6 %.
The loss curve of the BN‑enhanced model stays below 0.3 and is much more stable than the baseline.
Tuning Tips
Practical tips include careful handling of embedding dimensions for product layers, grouping embeddings for large‑scale features, and using batch normalization to stabilize training.
Conclusion
The proposed Wide‑ResNet with batch normalization substantially improves the “You may like” recommendation on 1688, demonstrating the value of residual connections and normalization in large‑scale CTR prediction.
References
Cheng, H.-T. et al., “Wide & Deep Learning for Recommender Systems,” 2016.
Qu, Y. et al., “Product‑based Neural Networks for User Response Prediction,” ICDM, 2016.
Guo, H. et al., “DeepFM: A Factorization‑Machine based Neural Network for CTR Prediction,” arXiv, 2017.
Rendle, S., “Factorization Machines,” ICDM, 2010.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
