Artificial Intelligence 17 min read

Breaking the Recommendation Feedback Loop with LLM‑Powered Dynamic User Knowledge Graphs

By integrating large language models to dynamically construct user knowledge graphs and applying two‑hop reasoning, the authors enhance serendipity in a large‑scale e‑commerce community recommendation system, achieving significant online gains in diversity, novelty, and user engagement metrics.

DeWu Technology

Jan 21, 2026

Breaking the Recommendation Feedback Loop with LLM‑Powered Dynamic User Knowledge Graphs

Introduction

In the Dewu community, the recommendation loop tends to reinforce similar content, causing result convergence and information cocoons that reduce user freshness and satisfaction. Leveraging the rapid knowledge extraction capabilities of large language models (LLMs), the authors propose a method to dynamically build user knowledge graphs and use controlled reasoning to uncover latent user interests, which are then integrated into an industrial recommendation pipeline.

Background

Dewu’s community tab serves millions of users for content creation and consumption. Traditional recommendation feedback loops push increasingly similar items, leading to homogenization. Recent advances in LLMs provide an opportunity to break this cycle by extracting richer world knowledge.

Problem and Challenges

Novelty recommendation requires items that are both unexpected and relevant, but data scarcity limits the use of large models.

LLMs struggle with multi‑step reasoning needed for complex queries.

Industrial recommendation systems demand sub‑100 ms latency, while LLM inference is costly.

Efficiently recalling candidates that match generated latent interests while maintaining high conversion efficiency is non‑trivial.

Proposed Solution

The solution consists of four key components:

Replace traditional small models with LLMs to extract latent interests from user behavior, alleviating data sparsity.

Introduce two‑hop reasoning combined with a multi‑agent, multi‑round debate mechanism to improve reasoning accuracy and stability.

Deploy a near‑line recall architecture to meet real‑time latency requirements.

Apply contrastive learning to align LLM‑extracted interests with existing user interest representations, ensuring high relevance and conversion efficiency.

Two‑hop Reasoning Process

Static user profile (age, gender) and recent search terms form the initial nodes. The LLM constructs a dynamic graph G=(V,E). For a given pair (v1, v3), two‑hop reasoning checks whether a latent interest relation exists.

Step 1: From v1 (static profile + search term) find intermediate nodes v2 that satisfy an upper‑level relation; v2 represents core user motivations.

Step 2: From v2 locate same‑level or lower‑level nodes v3 that correspond to concrete items, topics, or categories, limiting v3 to product‑related entities to reduce hallucinations.

Multi‑Agent Debate

Multiple LLM agents independently generate answers, then receive a consensus prompt that encourages each agent to refine its response based on peers’ outputs. Iterating this process yields a more factual and stable final answer.

SFT (Supervised Fine‑Tuning)

The large inference model deepseek‑r1 generates thought processes and latent interests, which are distilled into a smaller model qwq‑32b. The distilled outputs form a supervised fine‑tuning dataset D = {(x, y)} where x is the prompt and y is the LLM‑generated reasoning and interest. Fine‑tuning produces interestGPT, improving the probability of generating desired answers.

Interest‑aware Retrieval

During inference, each latent interest

(1 ≤ k ≤ n) is combined with user features and fed into the user tower to obtain an interest embedding

. ANN search retrieves candidate items for each interest, and all retrieved sets are merged with other recall channels.

Model Architecture

The system uses a dual‑tower (user‑tower and item‑tower) model with contrastive learning. User features (static profile fᵘ, historical interaction embeddings) and LLM‑generated interests are concatenated and passed through two fully‑connected layers to produce a user embedding. Item features (category, brand, tags) are similarly embedded. Contrastive loss maximizes similarity between embeddings of the same interest and minimizes similarity across different interests, while BCE loss models click preference.

Experiments

Online A/B tests were conducted on 10 % of Dewu’s traffic. The baseline was the existing community recall system using CLIP as the interest encoder. The proposed pipeline added a novelty recall channel based on LLM‑derived interests.

Eight online metrics were measured: average view duration (AVDU), UVCTR, average click‑through rate (ACR), engagement rate (ER), average first‑level category clicks (ACC‑1), average third‑level category clicks (ACC‑3), first‑level category novelty exposure (ENR), and first‑level category novelty click (CNR). Novelty is defined as items whose top‑level category does not appear in the user’s recent 200 clicks.

Offline evaluation on 10 k test samples showed the SFT‑trained interestGPT achieved 96 % of samples scoring 2 points or higher (1 % scored 0, 3 % scored 1).

Online results compared to baseline:

AVDU ↑ 0.15 %

UVCTR ↑ 0.07 %

ACR ↑ 0.15 %

ER ↑ 0.30 %

ACC‑1 ↑ 0.21 %

ACC‑3 ↑ 0.23 %

ENR ↑ 4.62 %

CNR ↑ 4.85 %

Novelty recall raised the novelty exposure rate from 14.24 % (baseline) to 26.53 % in the experimental group, while other channels also benefited, indicating a virtuous feedback loop.

Conclusion

The work demonstrates that LLM‑driven dynamic user knowledge graphs and two‑hop reasoning effectively break the recommendation feedback loop, delivering measurable gains in diversity, novelty, and user engagement at industrial scale. The approach is fully deployable in large‑scale systems.

Future Work

Future directions include incorporating richer interaction signals (clicks, browsing, favorites) to mitigate search‑behavior sparsity, extending interest representations to ranking stages (coarse, fine, re‑ranking), and dynamically calibrating generated interests with real‑time feedback to avoid over‑divergence. The authors also plan to explore generative recall models built on the same LLM architecture.

contrastive learning LLM Industrial Deployment Serendipity Two‑hop Reasoning User Knowledge Graph

Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.