Attention Mechanisms in Deep Learning Recommendation Models: A Survey
This article surveys the application of attention mechanisms in deep learning recommendation systems, reviewing models such as AFM, DIN, DIEN, DSIN, Behavior Sequence Transformer, Deep Spatio‑Temporal Networks, and ATRank, and discusses their architectures, attention types, advantages, and limitations.
Attention mechanisms have been widely adopted in image processing, natural language processing, reinforcement learning, and recommendation systems to enhance model expressiveness by assigning importance weights to feature interactions.
1. AFM: Attentional Factorization Machines
AFM builds on Factorization Machines by inserting a simple linear attention network that learns a weight for each pair‑wise feature interaction, turning the original quadratic term into a weighted sum.
AFM’s output is obtained by applying the learned attention weights to the interaction terms, but it does not incorporate deeper networks, limiting its capacity compared to more expressive DNN‑based models.
2. DIN: Deep Interest Network
DIN introduces an attention module that evaluates the relevance of each historical user behavior to the target item, allowing the model to focus on important behaviors while ignoring irrelevant ones.
The attention weight for a behavior is computed by feeding the user embedding, the target item embedding, and their difference into an MLP, producing a softmax‑normalized importance score.
3. DIEN: Deep Interest Evolution Network
DIEN first extracts user interests from the behavior sequence using a GRU, then applies an attention‑augmented GRU to highlight interest points that are most related to the target item.
The attention score is calculated as a traditional uᵀWv similarity between the target item embedding v and each historical behavior embedding u_i, followed by a softmax to obtain normalized weights.
4. DSIN: Deep Session Interest Network
DSIN processes each user session with a Transformer to obtain a session representation, then feeds the sequence of session vectors into a bidirectional LSTM. Two attention layers are used: one self‑attention on the latent semantic space and another attention that incorporates the target item.
The final representation concatenates user features, target item features, session‑interest vectors, and context‑aware session vectors before passing them to a fully connected layer.
5. Behavior Sequence Transformer
This model embeds the user’s behavior sequence and then applies a standard Transformer layer to capture long‑range dependencies before prediction.
6. Deep Spatio‑Temporal Neural Networks
The model takes target ad features, contextual ad features, clicked ad features, and non‑clicked exposure features, embeds them, and then applies two types of attention: a self‑attention over context items and an interaction‑based attention that also incorporates the target ad embedding.
7. ATRank: An Attention‑Based User Behavior Modeling Framework
ATRank divides the model into raw feature space, behavior embedding space, latent semantic space, behavior interaction layers, and downstream application layers. After projecting behavior vectors into multiple semantic spaces, a self‑attention mechanism aggregates them.
Overall, attention in recommendation models can be categorized into self‑attention (e.g., ATRank) and traditional soft attention (e.g., DIN, DIEN, DSIN) where the latter typically computes a uᵀWv similarity between historical behavior embeddings u and the target item embedding v. DIN uniquely combines u, v, and their difference as input to an MLP to obtain attention weights.
References
AFM paper
DIN paper
DIN explanation
DIEN paper
DSIN paper
Behavior Sequence Transformer paper
Deep Spatio‑Temporal Networks paper
ATRank paper
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
