Artificial Intelligence 13 min read

Multi‑Scale Stochastic Distribution Prediction for User Behavior Representation Learning

The paper proposes a multi‑scale stochastic distribution prediction (MSDP) framework that learns robust user behavior representations by predicting behavior distributions over random time windows, incorporates contrastive regularization, and demonstrates superior performance on both proprietary financial risk data and a public e‑commerce dataset compared with existing masked and next‑behavior pre‑training methods.

AntTech
AntTech
AntTech
Multi‑Scale Stochastic Distribution Prediction for User Behavior Representation Learning

The ACM CIKM 2023 best applied paper from Ant Group introduces a novel user behavior representation learning method called Multi‑Scale Stochastic Distribution Prediction (MSDP). It treats user behavior sequences similarly to language sequences but focuses on predicting the distribution of future actions over a time interval rather than specific next items.

MSDP defines a self‑supervised pre‑training task that predicts the probability distribution of K possible behaviors within a randomly sampled future window (T, T+W]. Multiple time‑scale windows are generated per sample, and each window size W is embedded as a prompt, enabling the model to capture periodic patterns and improve robustness.

To avoid over‑fitting to the distribution prediction, a contrastive regularization term is added: random subsets of behavior embeddings are masked, and the model is trained to keep the masked and unmasked sequence embeddings similar, following a SimSiam‑style cosine similarity loss weighted by a coefficient λ.

The overall loss combines the multi‑scale distribution prediction loss and the contrastive regularization loss. Experiments on a proprietary Ant Group financial‑risk dataset (predicting repayment probability over 5‑90 days) and the public Tmall dataset (predicting future interest categories) show that MSDP consistently outperforms baselines such as MBP, NBP, BERT4Rec, PTUM, UserBERT, static‑DP, and multi‑task variants.

Results indicate that predicting behavior distributions is more effective for noisy, random user actions than masked or next‑behavior objectives, and that multi‑scale prompt training yields better generalization than fixed‑window approaches. The authors conclude that MSDP provides a robust, unified representation useful for various downstream risk and recommendation tasks, and suggest future extensions with large language models and cross‑domain knowledge integration.

AIuser behavior modelingself-supervised learningpretrainingdistribution predictionmulti-scale
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.