ControlRec: Aligning LLMs with IDs to Boost Personalized Recommendations
ControlRec introduces heterogeneous feature matching and instruction contrastive learning to bridge the semantic gap between language models and discrete user/item IDs, enabling more effective personalized recommendation across multiple tasks such as rating prediction, sequential recommendation, and explanation generation.
1. Introduction
Large language models (LLMs) excel at natural‑language tasks but struggle to incorporate discrete user and item identifiers because IDs reside in a different semantic space from text. This semantic gap limits the direct use of LLMs for personalized recommendation. ControlRec addresses the gap with a contrastive prompting framework that adds two auxiliary objectives—Heterogeneous Feature Matching (HFM) and Instruction Contrastive Learning (ICL)—to align ID and textual representations.
2. Method
2.1 Model Architecture
ControlRec consists of three components:
ID encoder : encodes user and item IDs. IDs are split into sub‑tokens (e.g., item_1471 → ["item", "_", "1471"]) so the LLM can treat them as text.
Natural‑language (NL) encoder : encodes task‑specific textual inputs.
Shared decoder : receives the concatenated ID and NL embedding sequences and predicts the next token conditioned on the combined representation.
Because vanilla Transformers allow unrestricted token‑to‑token attention, a visibility matrix is introduced to restrict attention to token pairs that are truly related (e.g., items that co‑occur in a user’s interaction graph), effectively modeling the bipartite user‑item structure.
2.2 Heterogeneous Feature Matching (HFM)
After obtaining separate ID and NL embeddings, HFM aligns them in a shared semantic space using metric‑learning‑style objectives. Two sub‑tasks are defined:
Item Description : For each item ID, a positive textual description (e.g., "category: T‑shirt, brand: Nike") is paired with K negative descriptions sampled from other items. The model receives a prompt such as “Does the description match the item?” and computes cosine similarity between the ID embedding and each NL embedding. A softmax over the similarities yields a cross‑entropy loss that pulls the positive pair together and pushes negatives apart.
Sequence Prediction : Using a user’s interaction history as the ID input, the model predicts the description of the next preferred item. The same sampling‑and‑similarity procedure as the Item Description task is applied, encouraging the model to learn sequential recommendation capabilities.
The total HFM loss is the sum of the two sub‑task losses.
2.3 Instruction Contrastive Learning (ICL)
Standard prompt‑based fine‑tuning uses a single task‑specific instruction, making the model fragile to prompt variations. ICL mitigates this by training the model to produce consistent sequence representations across diverse instructions.
For each downstream task, a base prompt template (the “trigger”) is defined. ChatGPT is used to generate M paraphrased instructions for the same task and additional instructions from other tasks. During training the model receives one target instruction and M+1 candidate instructions (one positive from the same task, M negatives from other tasks). The decoder generates a sequence for each instruction; token embeddings are average‑pooled to obtain a sequence vector. A contrastive loss (softmax over cosine similarities) encourages the positive pair to be closer than the negatives.
The overall training objective combines HFM and ICL. The weight of the ICL loss is gradually increased according to a schedule λ_t = λ_0 + (1‑λ_0)·(t/T) (where t is the current step and T the total steps) to stabilize early training.
3. Results
ControlRec was evaluated on five downstream recommendation tasks: rating prediction, sequential recommendation, explanation generation, direct recommendation, and comment summarization. Across all tasks, ControlRec consistently outperformed baseline LLM‑based recommenders, demonstrating the effectiveness of HFM and ICL in bridging the semantic gap between IDs and natural language.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
