Scenario-aware Multi-Scenario Recommendation Models: SACN, SAINet, and DSWIN
The paper presents a comprehensive multi‑scenario recommendation study introducing three models—SACN, SAINet, and DSWIN—that integrate scene‑aware attention, attribute‑level preferences, and contrastive disentanglement to capture distinct user interests, achieving consistent AUC gains and online CTR improvements across real‑world datasets.
This document presents a comprehensive study of multi‑scenario learning (MSL) for recommendation systems, focusing on three novel models—SACN (Scenario and Attribute‑aware Contrastive Network), SAINet (Scenario‑adaptive User Behavior Modeling), and DSWIN (Disentangling Scenario‑wise Interest Network).
Overall Overview
The authors first describe the practical need for multi‑scenario modeling in e‑commerce platforms such as the Dewu App, where user behavior varies dramatically across scenes (e.g., homepage waterfall, cart page, order page). Two major challenges are identified: (1) how to capture distinct user interests (price, category, brand) across scenes, and (2) how to incorporate scene information into sequential behavior modeling.
Problem Definition
Given a set of users \(U\), items \(I\), item attributes \(A\), and a collection of scenes \(S\), the task is to predict the click‑through rate (CTR) of a target item \(i\) under a target scene \(s\) using the user’s historical interaction sequence. The objective is formalized as a binary classification problem with cross‑entropy loss, optionally combined with a self‑supervised contrastive loss.
SACN Model
SACN addresses challenge (1) by jointly modeling item‑level and attribute‑level preferences. The Item‑level Preference Extracting (IPE) module applies a scene‑aware multi‑head self‑attention where the query, key, and value matrices are augmented with target‑scene and target‑item embeddings. The Attribute‑level Preference Extracting (APE) module similarly integrates attribute embeddings with scene information. A Scenario Contrastive Module (SCM) treats the fused item‑level and attribute‑level representations as positive samples and representations from other scenes as negatives, optimizing an InfoNCE loss to separate scene‑specific interests.
SAINet Model
SAINet tackles challenge (2) by introducing a Scenario‑adaptive Block that stacks two sub‑modules: (1) Scenario‑aware Interest Extracting (SIE), which injects scene embeddings into a multi‑head attention over the behavior sequence, and (2) Scenario Tailoring Module (STM), a lightweight gating network that further customizes the interest representation with the target‑scene embedding. The outputs of multiple blocks are fused by a Target‑aware Interest Fusion (TIF) attention mechanism and passed through a Scenario‑aware DNN Tower (SDT) that dynamically scales top‑layer neurons based on scene context.
DSWIN Model
DSWIN further refines scene‑wise modeling by separating global and local interests. Global Interest Aggregation (GIA) uses a Scenario‑aware Context Aggregation Module (SCAM) and a Context Feedback Fusion Module (CFFM) to produce a scene‑conditioned global interest vector. Local Interest Resolution (LIR) splits the historical sequence into per‑scene sub‑sequences, processes each with an Interest Extracting Unit (IEU) that incorporates scene embeddings, and aggregates the results. An Interest Disentangling Module (IDM) applies contrastive learning on the global and local vectors to explicitly separate interests across scenes.
Experimental Setup
Experiments are conducted on two real‑world datasets: AliCCP (three scenes) and a proprietary Dewu dataset (five scenes). The primary evaluation metric is AUC for offline CTR prediction; online performance is measured by pvCTR. Baselines span three families: (1) General recommenders (DNN, DeepFM), (2) Scenario‑specific network structures (SharedBottom, MMoE, PLE, STAR, AESM2), and (3) Parameter‑adaptive structures (PEPNet, AdaSparse, SFPNet). All three proposed models consistently outperform baselines across datasets, with statistically significant gains (p < 0.05).
Ablation Studies
For SACN, removing APE or SCM degrades performance, confirming the importance of attribute modeling and contrastive supervision. For SAINet, ablating SIE, STM, TIF, or SDT each leads to noticeable AUC drops, highlighting the contribution of scene‑aware attention, target‑scene tailoring, attention‑based fusion, and adaptive towers. For DSWIN, removing GIA, LIR, or IDM similarly harms results, demonstrating the necessity of global context, per‑scene interest extraction, and contrastive disentanglement.
Hyper‑parameter Analysis
Key hyper‑parameters are explored: number of Scenario‑adaptive Blocks (L), gating scaling factor in STM, number of attention heads in SIE, block count in CFFM, temperature τ in contrastive loss, and the weight γ balancing supervised and self‑supervised objectives. Optimal settings (e.g., L = 2, scaling factor = 2, heads = 4, τ ≈ 0.2, γ = 1e‑1) are identified from validation curves.
Visualization & Online Tests
Embedding similarity distributions before and after IDM show clearer separation of scene‑specific interests when contrastive loss is applied. Online A/B tests on the production platform reveal pvCTR improvements of +1.02% for SACN, +1.02% for SAINet, and +1.51% for DSWIN over the PEPNet baseline.
Conclusions & Future Work
The study demonstrates that incorporating both scene context and item attribute information, together with self‑supervised contrastive objectives, yields substantial gains in multi‑scenario recommendation. Future directions include extending scene definitions beyond traffic source (e.g., user cohorts, product categories), applying the framework to multiple industry verticals, and integrating multimodal item features (text, images) for richer modeling.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.