Aligning Collaborative Filtering with LLM Token Generation: The TCA4Rec Breakthrough

This paper introduces the TCA4Rec framework that directly aligns item‑level collaborative‑filtering preferences with token‑level objectives of large language models, presenting novel modules, extensive experiments, and analysis that demonstrate significant performance gains in generative recommendation tasks.

Data Party THU
Data Party THU
Data Party THU
Aligning Collaborative Filtering with LLM Token Generation: The TCA4Rec Breakthrough

Background

Large language models (LLMs) excel at semantic modeling but cannot directly learn collaborative filtering (CF) behavioral signals because CF provides item‑level preferences while LLMs are trained with token‑level next‑token prediction (NTP). This mismatch limits recommendation performance.

Method

TCA4Rec aligns CF item‑level logits with LLM token‑level objectives via two modules.

Collaborative Tokenizer

Given the LLM generation step j, the tokenizer:

Collects candidate items whose textual representation shares the current token prefix, ensuring only feasible continuations are considered.

Applies softmax to the CF logits of these candidates to obtain an item‑level probability distribution.

Aggregates probabilities of items that map to the same next token, producing a token‑level CF distribution that is directly compatible with the LLM vocabulary.

Soft Label Alignment

The token‑level CF distribution P_cf(t) is combined with the one‑hot ground‑truth token label y_onehot(t) using a weighting factor α∈[0,1]: y_soft = (1‑α) * y_onehot + α * P_cf The resulting soft label is used in the cross‑entropy loss, allowing the model to balance semantic fluency (LLM) and collaborative consistency (CF).

The framework is model‑agnostic: any CF model (e.g., SASRec, BERT4Rec) can provide logits, and any decoder‑based generative recommender (e.g., TallRec, LLaRA, CoLLM, MSL) can consume the soft labels without architectural changes.

TCA4Rec framework diagram
TCA4Rec framework diagram

Experiments

Evaluations were performed on three public recommendation datasets (Toys, Sports, Office) using four LLM‑based generative backbones (TallRec, LLaRA, CoLLM, MSL). The primary metrics were NDCG@5 and Hit@5.

Across all dataset‑model combinations, integrating TCA4Rec yielded consistent improvements (e.g., +3–5% absolute NDCG@5). Model‑agnosticism was verified by applying TCA4Rec to semantic‑ID generators (TIGER, LETTER), which also showed notable gains.

Two ablation studies were conducted:

Removing the Collaborative Tokenizer (using only the one‑hot label) reduced performance, confirming the necessity of token‑level CF distribution.

Removing Soft Label Alignment (using only the CF distribution) also degraded results, demonstrating the importance of blending semantic and collaborative signals.

Performance comparison chart
Performance comparison chart
Results on TIGER and LETTER
Results on TIGER and LETTER
Ablation study results
Ablation study results

Conclusion

TCA4Rec provides a plug‑and‑play mechanism to inject structured CF supervision into LLM token‑level training, improving both semantic quality and collaborative relevance without modifying the underlying recommendation model. The approach demonstrates strong model‑independence and opens avenues for incorporating other non‑linguistic signals into generative systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMcollaborative filteringrecommendation systemsGenerative RecommendationTCA4RecToken Alignment
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.