Artificial Intelligence 32 min read

How LLMs Are Transforming Long-Term Cross-Domain Interest Modeling for Recommendations

The Datafun Summit 2025 talk by JD’s algorithm engineer Tian Mingyang explains how generative AI is driving a paradigm shift in recommendation systems, detailing the limits of traditional models, the new dynamic cross‑domain inference chain technique, joint engineering‑algorithm optimizations, and the remaining challenges for future deployment.

DataFunSummit

Nov 29, 2025

How LLMs Are Transforming Long-Term Cross-Domain Interest Modeling for Recommendations

Background

Large language models (LLMs) enable a paradigm shift in recommendation systems: instead of static matching, they generate multimodal content, perform cross‑domain intent reasoning, and support scenario‑aware inference.

Limitations of traditional modeling

User‑side bottlenecks

Long‑tail interests are missed when user interest trajectories break, especially for low‑frequency but persistent interests.

Cross‑scenario migration (e.g., e‑commerce → search) breaks continuity, causing inaccurate recommendations.

Periodic repetitive purchases (monthly or quarterly) are not automatically recognized, requiring manual search.

Algorithm‑side bottlenecks

RNN gradient decay and limited memory of gated units (LSTM/GRU) restrict capture of dependencies beyond ~100 steps.

Cold‑start and long‑tail prediction suffer from data sparsity and head‑dominant bias.

Attention weights lack semantic interpretability; cross‑domain interest mapping is fragmented.

Dynamic cross‑domain inference chain

LLMs are used to generate user portraits through a fast‑push chain, reducing information loss compared with traditional two‑stage SIM pipelines.

Architectural advantages

LLMs output natural‑language descriptions or dense embeddings directly, preserving semantic richness of low‑frequency behaviors.

Dynamic update efficiency

LoRA fine‑tuning updates less than 1 % of model parameters, enabling minute‑level adaptation (e.g., “ignore actions three days ago” or inject holiday promotions) without full retraining.

Explainability and cross‑domain capability

LLMs can produce textual explanations and multimodal embeddings (via CLIP/BLIP), aligning text, image and video features for seamless cross‑domain fusion.

Modeling workflow

Data integration & feature engineering

Multimodal inputs (text, images, video) are aligned with contrastive learning (CLIP, BLIP‑v2) into a shared space.

LLMs convert textual signals (product titles, comments) into high‑dimensional semantic vectors.

Chain‑of‑thought (CoT) prompts generate user portraits in three modes: unsupervised, optimal supervised, and sub‑optimal supervised.

Profile modeling & optimization

Behavior‑sequence modeling distinguishes short‑term (recent) and long‑term (high‑frequency) interests.

Domain‑specific knowledge bases (e.g., ChineseEcomQA) fine‑tune the model with structured label constraints.

Human review corrects LLM bias (e.g., gender stereotypes) and builds hierarchical interest models.

Storage & expression

Key‑value structured portraits (e.g., {"interests": ["photography", "hiking"]}) enable efficient retrieval.

Natural‑language portraits provide narrative descriptions for downstream business logic.

Layered modeling separates short‑term, long‑term, and potential interests.

Joint engineering‑algorithm optimization

To meet real‑time latency ( 50 ms ) and high throughput, the system combines three layers:

Offline layer : processes 180‑day historical data, stores long‑term embeddings in Redis.

Near‑line layer : consumes Kafka streams for 15‑minute incremental updates.

Online layer : merges offline and near‑line outputs for real‑time portrait service.

Item tiering separates head items (real‑time updated embeddings) from tail items (static semantic IDs).

Model optimizations

TokenID → SemanticID pipeline reduces retrieval cost and improves generation quality.

Joint representation modeling combines user IDs (encoded by a GNN) and behavior IDs (aligned by contrastive learning) in a dual‑channel network.

Dynamic batching, KV‑cache paging, and kernel fusion lower memory and compute overhead.

Model compression & inference

Layer‑wise distillation and mixed‑precision quantization (FP16 for attention, INT4 for feed‑forward) preserve accuracy.

Block‑wise quantization and quantization‑aware training (QAT) further reduce model size.

Head pruning removes attention heads with contribution < 0.01.

Industry progress and outlook

Current trends favor modular upgrades: embedding‑generated components replace specific modules in cascade architectures, while end‑to‑end generative pipelines remain experimental.

Key future challenges:

Compute and memory bottlenecks for ultra‑long sequences (O(n²) attention, >80 GB VRAM).

Position‑encoding limits and context fragmentation for multi‑month histories.

Ethical and privacy compliance (data protection laws, homomorphic encryption, transparency).

Promising directions include:

Sparse attention + state‑space models (e.g., Mamba) for linear‑time processing.

Mixture‑of‑Experts (MoE) routing to activate a subset of experts.

Dynamic position interpolation (YaRN) to extend context windows.

External memory banks for long‑term behavior retrieval.

Lightweight multimodal inference on edge devices.

Conclusion

LLM‑driven dynamic cross‑domain reasoning, together with joint engineering‑algorithm optimizations, can overcome the long‑term and cross‑domain bottlenecks of traditional recommendation pipelines. Remaining hurdles are scalability of attention, memory consumption, and privacy guarantees.

AI LLM Recommendation Systems large models Engineering Optimization Cross-Domain Modeling

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.