CoderRec: Latent Reasoning Boosts Sequential Recommendation

CoderRec, a new sequential recommendation framework jointly developed by Tencent Advertising Technology and Tsinghua University, combines domain‑specific latent reasoning with cross‑scale model collaboration to capture implicit user intent and fuse large‑language‑model semantics with traditional recommender signals, achieving state‑of‑the‑art performance on multiple Amazon datasets.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
CoderRec: Latent Reasoning Boosts Sequential Recommendation

Background

Sequential recommendation is increasingly embedded in everyday digital experiences, predicting the next item a user may want based on historical clicks, views, and purchases. Recent attempts to incorporate large language models (LLMs) have shown promise but face two major challenges: the scarcity of standardized reasoning data and the loss of rich semantic information when compressing LLM outputs into discrete IDs.

Challenges in Existing Methods

Reasoning data scarcity : Unlike mathematical reasoning, user behavior logic is highly contextual, subjective, and difficult to formalize, making high‑quality reasoning chains hard to obtain.

Insufficient semantic utilization : Current pipelines often compress LLM‑derived semantic embeddings into discrete IDs (e.g., via RQVAE), discarding much of the original semantic richness and limiting downstream recommendation performance.

CoderRec Overview

To address these issues, Tencent Advertising Technology and Tsinghua University propose CoderRec , a sequential recommendation framework that integrates domain‑specific latent reasoning and cross‑scale model collaboration . The core innovations are:

Latent reasoning mechanism : Enables the model to capture implicit user intent without manual annotation by activating reasoning processes in a latent space.

Cross‑scale model collaboration : Bridges high‑dimensional LLM semantics with low‑dimensional recommender representations, allowing mutual knowledge transfer.

Two‑stage training and representation alignment : Aligns LLM semantic knowledge with recommender signals through a dedicated loss.

CoderRec framework diagram
CoderRec framework diagram

Cross‑Scale Model Collaboration

LLMs such as Llama‑3 8B (4096‑dimensional representations) and Qwen‑3 4B (2560‑dimensional) vastly exceed the dimensionality of typical recommender models (≤128). Direct linear mapping leads to representation collapse. CoderRec adopts RQVAE as a hierarchical quantization bridge, compressing LLM embeddings while preserving essential semantics. Unlike prior work that only compresses, CoderRec also reconstructs semantic IDs, enabling bidirectional information flow between large and small models.

To avoid conflicts where different items share similar early RQVAE dimensions, CoderRec combines raw item IDs with semantic IDs into a cross‑scale ID , jointly embedding both sources:

Item ID embedding : Standard embedding of each product ID.

Semantic ID embedding : Multi‑layer embedding table where each layer corresponds to an RQVAE codebook.

Fusion : A lightweight linear fusion layer merges the two embeddings into a unified item representation.

Domain‑Specific Latent Reasoning

Inspired by Quiet‑STaR, CoderRec introduces a domain‑specific latent reasoning mechanism. User interaction sequences are treated as sentences, and hidden “thought trajectories” are inferred to model the implicit decision logic behind item transitions. These latent trajectories are injected into the LLM via a special <think> token and trained with a parallel attention mask that restricts each thought token’s attention to its own trajectory and preceding items.

A reasoning fusion module learns to combine the LLM’s raw output h, latent thought representation l, and domain‑specific thought representation, with the fusion weights initialized to zero to ensure stable early‑stage training.

Latent reasoning process diagram
Latent reasoning process diagram

Training Strategy

Because of the representation gap between recommender and LLM components, training proceeds in two phases:

Pre‑training (warm‑up) : Train the recommender head on the downstream task to obtain a solid baseline.

Joint training : Simultaneously optimize the recommendation head (cross‑entropy loss) and the token‑prediction head (reconstructing semantic IDs via RQVAE), weighted by hyper‑parameters λ₁ and λ₂.

Experimental Results

Experiments on three Amazon sub‑datasets (Beauty, Sports & Outdoors, Musical Instruments) show that CoderRec consistently outperforms baselines such as SASRec, BERT4Rec, and recent LLM‑enhanced recommenders. The cross‑scale collaboration and latent reasoning each contribute significant gains, with latent reasoning improving both large‑scale and small‑scale models.

Experimental performance table
Experimental performance table

Conclusion

CoderRec is the first framework to embed latent reasoning into LLM‑based sequential recommendation, leveraging cross‑scale model collaboration to fuse semantic richness with domain‑specific signals. Extensive experiments validate its superiority, and future work will explore more efficient semantic alignment and extensions to multi‑intent or long‑term conversational scenarios.

Artificial Intelligencelarge language modelsRecommender Systemssequential recommendationcross-scale collaborationlatent reasoning
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.