How LLMs Transform Recommendation Systems: The LEARN Framework Explained
This article reviews the Kuaishou paper on adapting large language models for recommendation, detailing the LEARN framework's dual‑tower architecture, embedding generation, loss functions, and experimental results that address cold‑start and long‑tail challenges in modern recommender systems.
LLM‑Rec Large Model Recommendation
Large language models (e.g., GPT‑3; the paper uses Baichuan2‑7B) can serve as universal knowledge bases to improve item description comprehension, boosting recommendation accuracy and diversity.
Baichuan2‑7B model: https://huggingface.co/baichuan-inc/Baichuan2-7B-Base
Previous works such as Bert4Rec, RankT5, and RecFormer incorporated large models but are not true LLMs.
Common LLM‑based recommendation strategies
Freeze LLM parameters : generate item content embeddings from textual fields (title, description, reviews, etc.). Example: Chat‑Rec.
Fine‑tune LLM on domain text : feed user behavior sequences as prompts so the LLM learns latent user‑item relations. Example: TallRec.
Both approaches tightly couple LLM and recommendation tasks, risking knowledge forgetting during task‑specific training.
LEARN Framework
LEARN (LLM‑driven Knowledge Adaptive Recommendation) integrates LLM embeddings via a dual‑tower architecture (User Tower and Item Tower) to enhance feature extraction.
Architecture Overview
Each tower contains a Content‑Embedding Generation (CEG) module and a Preference Comprehension (PCH) module.
Item Text Description
Item metadata (title, category, brand, price, keywords, attributes) are concatenated into a single sentence and fed to the frozen LLM.
Content‑Embedding Generation (CEG)
The frozen LLM produces high‑dimensional token vectors; a uniform pooling operation aggregates them into a fixed‑size embedding.
Preference Comprehension (PCH)
User embeddings are obtained by mapping historical item content embeddings into the collaborative space. A Transformer processes the sequence of item embeddings, and self‑supervised contrastive learning (InfoNCE) distinguishes preferred from non‑preferred items.
Historical and Target Sequences
For user i:
Historical interactions: U_hist_i = {Item_i1, Item_i2, ..., Item_iH} Target interactions:
U_tar_i = {Item_i(H+1), Item_i(H+2), ..., Item_i(H+T)}User Tower and Item Tower Variants
The User Tower implements the PCH module. The Item Tower has three variants:
Variant 1 : Mirrors the User Tower architecture and weights, but inputs the user’s target interaction sequence using causal attention to align user and item embeddings.
Variant 2 : Applies self‑attention only to the item itself, processing each item independently.
Variant 3 : Directly uses CEG‑generated content embeddings; during training it receives user target sequences, while inference uses only the item’s textual description.
During training, Variant 1 uses the target sequence; Variants 2 and 3 process items independently. At inference time, all variants accept a single item description and output its embedding.
Loss Functions and Model Output
The system predicts conversion rate (CVR) and adds a traditional MLP on top of the dual‑tower outputs.
Main Loss (InfoNCE)
InfoNCE encourages similarity of positive user‑item pairs (items actually interacted) and pushes apart negative samples (non‑interacted items).
Auxiliary Loss
An auxiliary cross‑entropy loss on CVR further improves performance by encouraging consistent fusion of different embedding types.
Experiments
Evaluation metrics include Hit Rate (H) and Recall (R). The paper reports consistent improvements of LEARN variants over state‑of‑the‑art baselines in CVR, AUC, and especially on cold‑start and long‑tail items. Online A/B tests show measurable AUC gains and CVR uplift.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
