Artificial Intelligence 8 min read

How Generative Models Are Redefining Recommendation Systems

This article reviews recent advances in generative recommendation, highlighting challenges such as item representation and multimodal fusion, and summarizing four key research papers that propose novel tokenization, collaborative integration, and transformer-based multimodal approaches to improve recommendation performance.

NewBeeNLP

May 28, 2024

How Generative Models Are Redefining Recommendation Systems

Modern recommendation systems traditionally retrieve suitable items from a candidate pool, but the rise of generative models like GPT has prompted researchers to recast recommendation as a sequence generation task, where user interactions are modeled to produce item representations.

Key challenges include:

Item representation : Unlike ID-based representations, generative methods use discrete tokens derived from titles, textual information, or numeric codes, making the quality of these tokens critical for downstream performance.

Multimodal fusion : With large language models (e.g., ChatGPT) and multimodal models (e.g., CLIP) advancing rapidly, integrating rich world knowledge and multimodal signals into generative recommenders remains an open problem.

Enhanced Generative Recommendation via Content and Collaboration Integration (ColaRec)

The ColaRec system employs an encoder‑decoder language model (T5) to jointly model item content and user‑item collaborative signals. It first pre‑trains a LightGCN model to obtain latent item embeddings, then applies k‑means clustering to create hierarchical discrete codes for items.

Two learning tasks align content and collaborative spaces:

User‑item recommendation: predict a discrete item code based on a user's historical interaction content (e.g., titles, brands).

Item‑item indexing: generate a discrete code for an item directly from its content.

These mechanisms effectively fuse collaborative and content information, boosting generative recommendation performance.

Learnable Tokenizer for LLM‑based Generative Recommendation (LETTER)

LETTER addresses the challenge of converting recommendation data into a language‑model‑friendly format. It introduces a learnable tokenizer that combines hierarchical semantics, collaborative signals, and diverse code allocation.

Key components:

Semantic extraction using LLaMA2‑7B to encode item textual information.

Residual Quantization Variational Auto‑Encoder (RQ‑VAE) that recursively quantizes semantic residuals into fixed‑length hierarchical identifiers.

Integration of collaborative filtering embeddings (e.g., SASRec, LightGCN) with contrastive learning to align quantized semantic codes with collaborative signals.

Diversity loss applied to codebook embeddings to encourage uniform distribution of discrete representations.

The approach improves the quality of generated recommendations by providing high‑quality discrete item codes for LLMs.

MMGRec: Multimodal Generative Recommendation with Transformer Model

MMGRec tackles the high inference cost and limited interaction modeling of traditional embedding‑retrieval recommenders by adopting a generative paradigm.

Methodology:

Graph RQ‑VAE fuses multimodal item features and user‑item collaborative interactions. It concatenates multimodal features into an initial item representation, constructs a bipartite user‑item graph, and updates node embeddings via a Graph Convolutional Network (GCN).

The refined item embeddings are quantized into hierarchical discrete codes using RQ‑VAE.

A Transformer‑based recommender generates discrete item codes conditioned on a user's historical interaction sequence.

This design enables efficient multimodal generative recommendation with improved accuracy.

Contrastive Quantization based Semantic Code for Generative Recommendation

This work improves upon existing generative recommendation methods that rely solely on reconstruction‑based quantization of textual item representations.

Approach:

Pre‑trained language models produce textual embeddings for items.

RQ‑VAE converts these embeddings into discrete codes.

An auxiliary contrastive loss treats the RQ‑VAE input‑output pair as a positive sample and uses in‑batch negative sampling to enforce discriminative coding, capturing finer item differences.

The contrastive quantization enhances the semantic distinctiveness of item codes, leading to better recommendation performance.