Self-Attention Boosts Heterogeneous User Behavior Modeling for Recommendations

This paper proposes a novel attention‑based framework that groups and encodes heterogeneous user behavior sequences into separate semantic subspaces, applies self‑attention to capture inter‑behavior influences, and demonstrates faster training and comparable or improved recommendation performance across multiple tasks and datasets.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Self-Attention Boosts Heterogeneous User Behavior Modeling for Recommendations

Research Background

Understanding users through their diverse behaviors is essential for personalized services. As platforms record increasingly varied actions, they need to fuse heterogeneous behavior data to better comprehend users.

In Alibaba's full‑scope marketing, integrating all user ecosystem behaviors is crucial, yet handling heterogeneous data remains challenging.

This work introduces a general user representation framework that groups different behavior types, maps them into distinct subspaces, and applies self‑attention to model inter‑behavior influence, validated on recommendation tasks.

Related Work

Traditional heterogeneous behavior modeling relies on manual feature engineering, focusing on aggregated or non‑sequential ID features.

Single‑behavior sequence modeling typically uses RNNs (LSTM/GRU) or CNN+Pooling, which suffer from limited parallelism and inability to retain specific behavior details.

Heterogeneous data representation learning often depends on strong supervision (e.g., image captioning), which is absent in our scenario.

ATRank Scheme

The overall framework consists of a raw feature layer, semantic mapping layer, self‑attention layer, and target network.

1. Behavior Grouping Each user action is a triple (type, target, time). Actions are grouped by target entity (e.g., product, coupon, keyword). The encoding of a behavior is the sum of target embedding, discretized time lookup, and action‑type lookup.

2. Semantic Space Mapping Heterogeneous behaviors are linearly projected into multiple semantic subspaces (analogous to RGB channels), enabling comparable semantics across different behavior types.

3. Self‑Attention Layer Self‑attention transforms each behavior from an objective representation to a memory‑aware one, allowing multi‑layer attention to capture higher‑order influences.

4. Target Network Depending on downstream tasks (e.g., click‑through prediction), the target network combines the predicted behavior embedding with the user representation via vanilla attention and feeds it to a ranking network.

Offline Experiments

We evaluated the framework on the Amazon purchase behavior dataset. Training converged faster than CNN/LSTM baselines, and average AUC improved slightly.

Conclusion: self‑attention with time encoding can replace CNN/LSTM encoders, achieving up to 4× faster training and comparable or better recommendation performance.

Case Study

We visualized attention scores across semantic spaces on the Amazon dataset, revealing distinct focus patterns per space, such as category‑specific attention in certain subspaces.

Multi‑Task Learning

We trained on three heterogeneous behaviors (purchase click, coupon claim, keyword search) simultaneously, comparing seven training modes (single‑behavior, multi‑model, all‑in‑one). The all‑in‑one multi‑task setting achieved the best overall performance, demonstrating the framework’s ability to leverage additional behavior data.

Conclusion

We presented a universal user representation framework that fuses heterogeneous behavior sequences and validated its effectiveness in recommendation tasks. Future work aims to incorporate richer commercial scenarios and data to build a flexible, extensible user representation system for superior personalization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

recommendationmulti-task learninguser modelingSelf-Attentionheterogeneous behavior
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.