Self-Attention Boosts Heterogeneous User Behavior Modeling for Recommendations
This paper proposes a novel attention‑based framework that groups and encodes heterogeneous user behavior sequences into separate semantic subspaces, applies self‑attention to capture inter‑behavior influences, and demonstrates faster training and comparable or improved recommendation performance across multiple tasks and datasets.
Research Background
Understanding users through their diverse behaviors is essential for personalized services. As platforms record increasingly varied actions, they need to fuse heterogeneous behavior data to better comprehend users.
In Alibaba's full‑scope marketing, integrating all user ecosystem behaviors is crucial, yet handling heterogeneous data remains challenging.
This work introduces a general user representation framework that groups different behavior types, maps them into distinct subspaces, and applies self‑attention to model inter‑behavior influence, validated on recommendation tasks.
Related Work
Traditional heterogeneous behavior modeling relies on manual feature engineering, focusing on aggregated or non‑sequential ID features.
Single‑behavior sequence modeling typically uses RNNs (LSTM/GRU) or CNN+Pooling, which suffer from limited parallelism and inability to retain specific behavior details.
Heterogeneous data representation learning often depends on strong supervision (e.g., image captioning), which is absent in our scenario.
ATRank Scheme
The overall framework consists of a raw feature layer, semantic mapping layer, self‑attention layer, and target network.
1. Behavior Grouping Each user action is a triple (type, target, time). Actions are grouped by target entity (e.g., product, coupon, keyword). The encoding of a behavior is the sum of target embedding, discretized time lookup, and action‑type lookup.
2. Semantic Space Mapping Heterogeneous behaviors are linearly projected into multiple semantic subspaces (analogous to RGB channels), enabling comparable semantics across different behavior types.
3. Self‑Attention Layer Self‑attention transforms each behavior from an objective representation to a memory‑aware one, allowing multi‑layer attention to capture higher‑order influences.
4. Target Network Depending on downstream tasks (e.g., click‑through prediction), the target network combines the predicted behavior embedding with the user representation via vanilla attention and feeds it to a ranking network.
Offline Experiments
We evaluated the framework on the Amazon purchase behavior dataset. Training converged faster than CNN/LSTM baselines, and average AUC improved slightly.
Conclusion: self‑attention with time encoding can replace CNN/LSTM encoders, achieving up to 4× faster training and comparable or better recommendation performance.
Case Study
We visualized attention scores across semantic spaces on the Amazon dataset, revealing distinct focus patterns per space, such as category‑specific attention in certain subspaces.
Multi‑Task Learning
We trained on three heterogeneous behaviors (purchase click, coupon claim, keyword search) simultaneously, comparing seven training modes (single‑behavior, multi‑model, all‑in‑one). The all‑in‑one multi‑task setting achieved the best overall performance, demonstrating the framework’s ability to leverage additional behavior data.
Conclusion
We presented a universal user representation framework that fuses heterogeneous behavior sequences and validated its effectiveness in recommendation tasks. Future work aims to incorporate richer commercial scenarios and data to build a flexible, extensible user representation system for superior personalization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
