How Auto Risk Transforms Behavior Sequence Data with Unsupervised Pre‑Training

This article introduces Auto Risk, a deep‑learning risk model for behavior‑sequence data that leverages unsupervised pre‑training with proxy tasks, details its convolution‑attention encoder, demonstrates significant gains across multiple business scenarios, and highlights its strong small‑sample and analogy capabilities.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Auto Risk Transforms Behavior Sequence Data with Unsupervised Pre‑Training

1. Background

Behavior‑sequence data such as shopping logs or risk events are common internal inputs for recommendation and risk control, requiring conversion into feature vectors for sequence classification.

Traditional methods rely on handcrafted trigger and accumulation features fed to GBDT classifiers, while recent approaches use RNN, CNN, or Attention networks to directly process raw sequences, reducing manual feature engineering.

2. Pretraining

Inspired by NLP pre‑training (ELMo, GPT, BERT, ERNIE), Auto Risk adopts proxy tasks on massive unlabeled data to learn generic high‑level representations, which can be fine‑tuned for downstream tasks.

Key pre‑training conditions include:

Proxy task accumulation: From simple Cbow/Skip‑Gram to Masked‑LM and NSP, increasing task difficulty to capture abstract knowledge.

Deep networks: Larger capacity models (e.g., ResNet, SBBB) extract richer features.

Attention: Provides memory, alignment, and global view, crucial for sequence data.

CNN rise: Parallelizable, stackable convolutions combined with attention form efficient Transformers.

Technology tree for BERT
Technology tree for BERT

3. Problem Analysis

Existing NLP‑style models are supervised and suffer from label scarcity; behavior sequences have multiple fields, heterogeneous modalities, long lengths, and no natural sentence boundaries, making direct BERT adaptation inefficient.

Unlabeled data is abundant, and learning generic features from it can dramatically improve data utilization and complement handcrafted features.

4. Model Design

4.1 Proxy Tasks

Two proxy tasks are defined:

Masked Language Model at the event level – mask a token at time t and predict it, encouraging the model to capture local context.

Quick Thought at the sequence level – split a sequence into two sub‑sequences, encode each with a Siamese network, and predict whether a pair originates from the same source.

Two proxy tasks
Two proxy tasks

4.2 Network Structure

The encoder combines embedding, convolution, and attention layers:

Embedding: Field‑wise embeddings (event type, time, amount, channel, product name) are merged via addition or concatenation; textual fields are aggregated by convolution or averaging.

Convolution layer: Captures local context, assumed to be the primary feature for risk scenarios.

Attention layer: Captures global context, providing additional view.

One convolution + one attention form a block; multiple blocks are stacked with ResNet‑style shortcuts, analogous to stacking Transformer layers.

Auto Risk model architecture
Auto Risk model architecture

4.2.1 Convolution Improvements

Standard convolutions cause gradient diffusion and high computation. Auto Risk replaces them with Gated Conv and Depthwise Separable Conv, reducing parameters from 320k to 60k (≈20%) for D=256, K=5, and even lower for larger kernels, accelerating convergence.

Convolution improvement diagram
Convolution improvement diagram

4.2.2 Attention Improvements

Self‑Attention’s memory cost grows O(N²). Fixed‑Size or Block Attention reduces it to O(2NK) or O(NK), enabling training on sequences up to length 4000 with three improved attention layers on a single GPU.

Attention improvement diagram
Attention improvement diagram

4.3 Training

With the optimized encoder, Auto Risk trains three‑layer encoders on sequences of length 4000 using a single GPU, achieving 2–3× batch speedup over Transformers and converging in less than a day for tens of millions of samples.

Training speed comparison
Training speed comparison

5. Application Effects

5.1 Business Gains

Adding Auto Risk vectors to state‑of‑the‑art handcrafted features improves AUC by 3–6 points in risk control tasks. Fine‑tuning the pre‑trained model yields further gains, confirming the benefit of unsupervised pre‑training.

Business gain chart
Business gain chart

5.2 Multi‑Scenario Results

The same pre‑trained model, without any task‑specific features or fine‑tuning, achieves high AUC (up to 0.9) on unrelated tasks such as gender and age prediction using only a linear classifier, demonstrating strong generalization.

Multi‑scenario results
Multi‑scenario results

5.3 Small‑Sample Learning

Pre‑training provides a solid foundation, allowing models to achieve superior performance with limited labeled data, beneficial for cold‑start or expensive‑label scenarios. In a “buy‑now‑pay‑later” use case, Auto Risk outperforms supervised networks even with only 40 k labeled samples.

Small‑sample learning results
Small‑sample learning results

5.4 Sequence Analogy

Analogous to word‑embedding analogies, Auto Risk vectors capture semantic relationships. Experiments such as “Taobao credit‑payment – Taobao balance = External credit‑payment – External balance” retrieve expected counterparts, and numeric fields preserve magnitude differences.

A=[创建交易-淘宝实物担保,花呗付款-淘宝实物担保,...]
B=[创建交易-淘宝实物担保,余额付款-淘宝实物担保,...]
C=[花呗付款-站外即时到账,...]
D=[app端-登录,余额付款-站外即时到账,...]

6. Conclusion

Auto Risk introduces an unsupervised pre‑training framework for behavior‑sequence data, designing proxy tasks and a convolution‑attention encoder tailored to the data’s characteristics. Deployed in real business, it yields significant AUC improvements, strong cross‑scenario generalization, and notable benefits for small‑sample learning, with future work extending to more data sources and proxy tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Deep LearningUnsupervised LearningpretrainingRisk Modelingbehavior sequence
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.