Artificial Intelligence 13 min read

Multi‑Scale Stochastic Distribution Prediction for User Behavior Representation Learning

The paper proposes a multi‑scale stochastic distribution prediction (MSDP) framework that learns robust user behavior representations by predicting behavior distributions over random time windows, incorporates contrastive regularization, and demonstrates superior performance on both proprietary financial risk data and a public e‑commerce dataset compared with existing masked and next‑behavior pre‑training methods.

AntTech

Nov 7, 2023

Multi‑Scale Stochastic Distribution Prediction for User Behavior Representation Learning

The ACM CIKM 2023 best applied paper from Ant Group introduces a novel user behavior representation learning method called Multi‑Scale Stochastic Distribution Prediction (MSDP). It treats user behavior sequences similarly to language sequences but focuses on predicting the distribution of future actions over a time interval rather than specific next items.

MSDP defines a self‑supervised pre‑training task that predicts the probability distribution of K possible behaviors within a randomly sampled future window (T, T+W]. Multiple time‑scale windows are generated per sample, and each window size W is embedded as a prompt, enabling the model to capture periodic patterns and improve robustness.

To avoid over‑fitting to the distribution prediction, a contrastive regularization term is added: random subsets of behavior embeddings are masked, and the model is trained to keep the masked and unmasked sequence embeddings similar, following a SimSiam‑style cosine similarity loss weighted by a coefficient λ.

The overall loss combines the multi‑scale distribution prediction loss and the contrastive regularization loss. Experiments on a proprietary Ant Group financial‑risk dataset (predicting repayment probability over 5‑90 days) and the public Tmall dataset (predicting future interest categories) show that MSDP consistently outperforms baselines such as MBP, NBP, BERT4Rec, PTUM, UserBERT, static‑DP, and multi‑task variants.

Results indicate that predicting behavior distributions is more effective for noisy, random user actions than masked or next‑behavior objectives, and that multi‑scale prompt training yields better generalization than fixed‑window approaches. The authors conclude that MSDP provides a robust, unified representation useful for various downstream risk and recommendation tasks, and suggest future extensions with large language models and cross‑domain knowledge integration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI user behavior modeling self-supervised learning pretraining distribution prediction Multi-Scale

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.