How LLMs Transform Recommendation Systems: Insights from Kuaishou’s LERAN Paper
This article analyzes Kuaishou’s May 2024 paper on LLM‑driven recommendation, detailing its dual‑tower architecture, contrastive learning of user and item embeddings, and a CVR‑auxiliary task that together improve cold‑start handling and boost both offline and online AUC metrics.
TL;DR
The paper introduces LERAN, a dual‑tower framework that freezes a large language model (Baichuan2‑7B) to extract text embeddings, trains user and item embeddings via contrastive learning, and adds a CVR auxiliary task to align these embeddings with ranking objectives, yielding notable gains in cold‑start scenarios.
Background
Traditional recommender systems rely on ID embeddings, which ignore semantic information in item descriptions and struggle with sparse interaction data for new users or items. Large language models excel at capturing semantic knowledge, prompting the idea of using them as feature extractors to alleviate cold‑start and long‑tail problems.
Method
The proposed framework, LERAN (LLM‑driven Knowledge Adaptive Recommendation), adopts a dual‑tower architecture. Each tower consists of a Content Embedding Generation (CEG) module and a Preference Understanding (PCH) module.
CEG : Uses a frozen Baichuan2‑7B model to encode item text (title, category, brand, price, keywords, attributes). Token‑level hidden states from the final layer are averaged to form the item representation.
PCH : Aligns the LLM‑generated embeddings with the recommendation task via self‑supervised contrastive learning. User interaction sequences are fed into a causal‑attention transformer, producing user or item embeddings.
Three variants of the item tower were explored; Variant 1 (identical structure and weights to the user tower) performed best and was adopted.
Training proceeds by sampling a user’s historical behavior sequence, splitting it into two parts, and using one part as input to the user tower and the other to the item tower. Positive samples are the next interacted items; negatives are items from other users. The loss is Info‑NCE.
Experiments
Offline Evaluation
LLM‑derived embeddings outperform traditional ID embeddings and BERT‑generated features.
On the public MovieLens dataset, LERAN surpasses state‑of‑the‑art methods HSTU and SASRec.
Freezing the LLM and adding a transformer encoder yields better results than LoRA fine‑tuning.
Online Evaluation
An auxiliary CVR task is added beside the ranking model. The learned user and item embeddings are concatenated, passed through an MLP, and the intermediate vector (mid‑emb) is combined with existing features for the final ranking model. This double‑alignment—first in the LERAN pre‑training, then in the CVR task—ensures that the embeddings carry both semantic knowledge and recommendation‑specific signals.
Online A/B tests show increased profit and AUC, especially for long‑tail and cold‑start users/items, confirming that LLM‑infused semantics effectively mitigate sparsity issues.
Analysis
The improvements stem from two factors: (1) rich semantic information from the LLM provides an informative boost for users/items with limited interaction data; (2) the two‑stage alignment forces the embeddings to be useful for the downstream ranking objective, reducing noise that typically plagues raw pretrained features.
Overall, the study demonstrates a practical pathway—LLM‑to‑Rec—where large language models serve as feature extractors rather than generative recommenders, making them suitable for large‑scale industrial recommendation systems.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
