How Feizhu Upgraded Its Recommendation Engine from Linear to End‑to‑End Deep Models

This article details the evolution of Feizhu's "Guess You Like" ranking system, moving from a linear FTRL model to several end‑to‑end deep learning versions—including PALM, FB‑PALM, and GLA—highlighting technical challenges, architectural changes, and measurable performance gains.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Feizhu Upgraded Its Recommendation Engine from Linear to End‑to‑End Deep Models

Introduction

Feizhu's "Guess You Like" ranking model was upgraded from a linear FTRL model to an end‑to‑end deep model, undergoing multiple version iterations such as PALM, FB‑PALM, and GLA, with various technical improvements documented.

Problem Analysis

While the existing feature set was extensive, the linear model left room for richer feature interactions. End‑to‑end deep models can implicitly and explicitly cross features, but they demand careful feature selection, handling of high‑dimensional ID features, and attention to model generalization, training over‑fit, and online latency.

Model Iterations

PALM (Pure Adaptive L2 Model)

Initial attempts to migrate all features to the deep side of a Wide‑and‑Deep architecture yielded limited offline gains. Pure deep models suffered from over‑fitting, especially with high‑dimensional sparse ID features.

High‑dimensional ID features (item ID, user ID, trigger ID) required adaptive L2 regularization to prevent over‑fitting.

Lookup (hit) features performed better when represented as dense values rather than zero‑filled embeddings.

Warm‑up + Adam + learning‑rate decay provided the largest offline improvement.

Batch normalization after embedding and before fully‑connected layers stabilized training and improved convergence.

Model structure:

The loss function is pointwise, and offline evaluation uses T+1 AUC on the same training window.

FB‑PALM (FeedBack‑PALM)

After PALM, the team added real‑time click and non‑click behavior sequences. Various deep CTR blocks (DCN, DeepFM, XDeepFM, etc.) were tested but offered only marginal gains over the pure deep baseline.

Key enhancements:

Incorporated short‑term global item click sequences and exposure‑without‑click sequences as additional inputs.

Used attention pooling over these sequences, concatenated with pure deep inputs, and fed into a multi‑layer feed‑forward network.

Results: uCTR +1.0% and pCTR +1.5% compared with the pure deep model.

GLA (Global Local Attention‑PALM)

To address coverage gaps, a full‑site behavior sequence (including flights, trains, hotels, and items) was introduced. The previous additive attention pooling was replaced with a transformer‑plus‑attention pipeline to capture intra‑sequence relations.

Full‑site sequences use only high‑coverage attributes (destination, category, POI, tag, behavior type) without ID features to avoid over‑fitting.

Sequence pooling combines Multi‑CNN extraction with attention, sharing parameters across different sequence types.

Transformer layers precede additive attention for item click and non‑click sequences, capturing self‑attention among sequence elements.

Results: uCTR +1.0% and pCTR +3.0% over the FB‑PALM model.

Other Attempts

First‑order neighbor information and pretrained embeddings (text, image, DeepWalk) gave large gains on the pure deep baseline but contributed little when stacked on later versions.

Time‑aware attention was explored by adding temporal features to the attention mechanism; it yielded modest offline improvements but was not deployed.

Future Outlook

Key challenges remain in learning robust models across heterogeneous data sources and exploring explicit feature crossing to further boost performance.

feature engineeringrecommendationAIdeep learningAttentionmodel iteration
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.