LFM4Ads: Full-Representation Multi-Granular Transfer Boosts Ad Recommendation

Tencent's LFM4Ads foundation model introduces a full-representation, multi-granular knowledge transfer framework that moves user, item, and cross representations to downstream tasks, dramatically improving ad recommendation metrics across dozens of business scenarios.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
LFM4Ads: Full-Representation Multi-Granular Transfer Boosts Ad Recommendation

Industry Pain Points and Solution Idea

Current recommendation systems adopt a foundation‑expert paradigm, training a large base model on massive data and then transferring intermediate representations to downstream expert models. Existing work suffers from three limitations: incomplete representation transfer (only user representation is moved, ignoring item and cross representations), difficulty transferring cross representations (CR involves both user and item, making alignment hard), and downstream usage limited to a single extra feature.

Incomplete representation transfer: only user representation (UR) is migrated, neglecting item representation (IR) and cross representation (CR).

Cross representation hard to transfer: CR ties both user and item, complicating sample alignment.

Downstream usage single: upstream representations are merely added as an extra feature.

LFM4Ads: Full-Representation Multi-Granular Transfer

We propose a framework that transfers UR, IR, and CR together, makes CR transferable by aggregating it to user‑level or item‑level representations, and offers three downstream usage granularities.

Comprehensive representation transfer: UR, IR, and CR are all migrated to downstream.

Transferable cross representation: CR is aggregated to user‑level/item‑level, reducing its quantity and aligning with downstream samples.

Multi‑granular downstream usage: feature‑level, module‑level, and model‑level applications.

Model Design and Representation Extraction

LFM4Ads uses a three‑tower architecture: a user tower extracts UR, an item tower extracts IR, and a mixing tower combines them, performs interaction, passes through an MLP and task head, and outputs predictions. An intermediate MLP layer provides CR. Separate branches handle content and ad samples; only the ad branch’s CR is used for ad recommendation.

During training, UR, IR, and CR are stored for downstream use. UR/IR are coarse‑grained, while CR is fine‑grained, capturing detailed user‑item interaction.

Enhancing Transferability of Cross Representation

CR is sample‑level, massive, and hard to store or align. We aggregate sample‑level CR into user‑level and item‑level representations, dramatically reducing its count and enabling a time‑aware exponential moving average update that adapts to activity levels.

Multi‑Granular Downstream Usage

We define three ways to employ upstream representations:

Feature‑level

Directly concatenate upstream representations with downstream features; CR passes through a small adapter to bridge semantic gaps.

Module‑level

Transfer the upstream interaction module and MLP as isomorphic modules to downstream, allowing fine‑tuning of the transferred parameters.

Model‑level

Use UR and IR to compute cosine similarity for a recall model; optionally add an adapter and train with an InfoNCE loss.

Online Deployment Scale and Business Impact

We train on billions of daily samples (80% content, 20% ads), each containing thousands of features and rich behavioral sequences. The final LFM4Ads model contains terabytes of parameters, handles billions of sparse features, and delivers tens of thousands of QPS.

Since Q4 2024, deployment across multiple scenarios has increased GMV by up to 2.16% and improved pCTR, pCVR, and pLTV metrics across tasks.

Comparison of existing work and our approach
Comparison of existing work and our approach
Model design of LFM4Ads
Model design of LFM4Ads
Feature/Module/Model usage
Feature/Module/Model usage
Upstream‑downstream workflow
Upstream‑downstream workflow
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

representation learninglarge-scale trainingfoundation modelKnowledge Transferad recommendation
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.