Artificial Intelligence 19 min read

How Meta Scales User Modeling for Ads: Inside the SUM Framework

This article examines Meta's SUM (Scaling User Modeling) system, detailing its upstream‑downstream architecture, the SOAP online asynchronous serving platform, production optimizations, and extensive offline and online experiments that demonstrate significant gains in ad personalization performance.

NewBeeNLP

Jul 22, 2024

How Meta Scales User Modeling for Ads: Inside the SUM Framework

Introduction

Personalized recommendation underpins modern online advertising, improving advertiser ROI while enhancing user experience. Traditional systems relied on handcrafted features and simple architectures, but deep‑learning‑based recommender systems now dominate, learning nuanced user representations at scale.

In practice, constraints such as training throughput, service latency, and host memory limit the full exploitation of massive user data, especially for large‑scale systems like Meta that handle billions of requests daily. The main challenges are sub‑optimal representations, feature redundancy, data scarcity for niche models, and non‑scalable custom architectures.

SUM Overview

Meta proposes Scaling User Modeling (SUM) , an online framework that reshapes user modeling for ad personalization. SUM combines advanced modeling techniques with real‑world constraints to enable effective, scalable representation sharing across hundreds of production ranking models.

Model Architecture

Preliminary

SUM adopts the DLRM architecture, consisting of a User Tower and a Mix Tower . The User Tower processes massive, heterogeneous user‑side features into compact embeddings, which the Mix Tower integrates with other features (e.g., ad attributes) for downstream prediction.

User Tower

The tower uses a hierarchical pyramid structure with residual connections to preserve information while compressing inputs. After initial embedding, the model produces many sparse embeddings and a smaller set of dense embeddings (dimension D), which are merged into a unified user embedding.

Attention‑Based Dimensionality Reduction

Point‑wise attention compression reduces the size of the dot‑product matrix, adding residual connections to learn expressive representations efficiently.

Deep Cross Network (DCN)

DCN layers capture explicit and implicit feature interactions through learnable weights and biases, enabling high‑order cross features.

MLP‑Mixer

The MLP‑Mixer, originally designed for vision, treats channel mixing as 1×1 convolutions and token mixing as depth‑wise convolutions, providing a fully‑MLP alternative for feature interaction.

Mix Tower

The Mix Tower mirrors the DHEN architecture but receives only the SUM user embeddings (no raw user features), encouraging the upstream model to learn high‑quality representations. Multi‑task cross‑entropy loss balances several downstream tasks using task‑specific weights.

Online Serving System: SOAP

Because user features change frequently, offline pipelines cause stale embeddings. SUM introduces the SUM Online Asynchronous Platform (SOAP) , which generates fresh embeddings on each request within a 30 ms latency budget. SOAP separates write (embedding generation) from read (embedding retrieval), allowing asynchronous updates without blocking downstream inference.

Productionization

Model Training

SUM models are trained offline in a cyclic fashion, preserving historical trends while adapting to dynamic user preferences. Snapshots are refreshed regularly and served online, ensuring up‑to‑date embeddings for downstream models.

Embedding Distribution Shift

Two strategies mitigate distribution shift: (1) align downstream models with the newest embeddings by adjusting training pipelines, and (2) apply average‑pooling of the two most recent embeddings to smooth changes, reducing performance degradation.

Feature Storage Optimization

Each user model produces K=2 embeddings of dimension D=96. Quantizing these from fp32 to fp16 halves storage without harming downstream performance.

Distributed Inference

Distributed inference spreads the online inference load across multiple nodes, improving memory usage and latency for the User Tower.

Experiments

Dataset

All experiments use internal, industrial‑scale datasets (public datasets are unsuitable due to domain mismatch).

Evaluation Metric

Standardized Entropy (NE) measures offline prediction accuracy by comparing log‑loss against a baseline CTR model; lower NE indicates better performance.

FB CTR SUM User Model

The Facebook CTR model processes ~60 billion daily events, with ~600 sparse and 1 000 dense user features. The User Tower occupies ~160 GB and requires ~390 M FLOPs per inference.

Downstream Offline Results

SUM embeddings improve NE across multiple downstream tasks (ad click‑through, conversion, app install, offline conversion). Gains are larger for Facebook than Instagram, highlighting domain differences. Embedding addition does not increase computational overhead for downstream models.

Online Performance

Deploying SUM across hundreds of Meta ad products yields a 2.67 % lift in overall ad metrics (statistically significant) with only a 15.3 % increase in service capacity.

Async Serving

Four serving strategies were compared: frozen model, offline batch, online real‑time, and online asynchronous (SOAP). Asynchronous serving reduced performance loss to ~10 % when switching from a compact 20 M FLOP model, outperforming the real‑time to batch transition.

Embedding Distribution Shift Study

Experiments measuring cosine similarity and L2 norm between consecutive daily embeddings show that average‑pooling across two snapshots markedly reduces shift and improves NE, confirming the effectiveness of the proposed mitigation technique.

personalization deep learning recommendation systems user modeling online serving Meta

Written by

NewBeeNLP

Always insightful, always fun

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.