From Finding to Generating Videos: How Kuaishou’s RaG Transforms Recommendation Systems

Kuaishou’s new Recommendation-as-Generation (RaG) framework replaces traditional retrieve-and-rank with a generative pipeline that predicts user interests, creates personalized video content, and closes the loop with feedback, delivering a 1.87% ad‑revenue lift for over 400 million daily users.

Machine Heart
Machine Heart
Machine Heart
From Finding to Generating Videos: How Kuaishou’s RaG Transforms Recommendation Systems

Over the past decade, recommendation systems have been dominated by a “retrieve‑and‑rank” paradigm: user profiles are matched against a pool of existing videos, which are then sorted and served. This approach reaches a natural limit when the desired content does not exist in the pool.

Kuaishou’s recent paper introduces Recommendation‑as‑Generation (RaG), which shifts the system from searching for existing videos to generating videos that match predicted user interests. The core idea is to first predict a user’s latent interest and then directly generate a personalized video aligned with that interest.

RaG has been deployed in Kuaishou’s large‑scale advertising system, serving more than 400 million daily active users. Online A/B experiments show a +1.870 % increase in ad revenue compared with a strong Generative Recommendation Model (GRM) baseline.

System Architecture

The traditional pipeline “user profile → interest modeling → retrieve existing video → rank & serve” is replaced by “user profile → interest semantic ID → video production instruction → personalized video generation → user‑feedback loop.” The architecture consists of five modules:

D‑SIDs (Disentangled Semantic IDs) : each video is encoded into two separate IDs—Content SIDs (what the video is about) and Creative SIDs (how the video is presented). A two‑layer codebook with 8 192 entries per layer is used for quantization.

GRM (Generative Recommendation Model) : predicts future interest as a sequence of D‑SIDs instead of a fixed video ID.

Instruction Model (IM) : converts D‑SIDs and ad metadata into shot‑level production instructions.

VGAs (Video Generation Agents) : three agents generate visual, audio, and effects tracks. They perform hierarchical planning, reasoning, and limited self‑reflection (≤ 2 rounds) to ensure cross‑modal consistency.

SCRL (Synergistic Cross‑Domain Reward Learning) : a closed‑loop optimization that jointly considers user‑feedback reward, interest‑alignment reward, and video‑quality reward. Group‑decoupled normalization and PID‑controlled Lagrangian multipliers balance the objectives.

Key Challenges and Solutions

Unifying interest recommendation and video generation : Recommendation models handle discrete, heterogeneous signals, while video generators process continuous multimodal data. Without a unified semantic interface, the recommendation output cannot reliably drive generation. D‑SIDs provide a disentangled representation that separates content semantics from creative style, enabling stable conditioning for the generator.

Industrial‑scale personalized video production : High‑quality video generation typically requires complex prompts, multiple human‑in‑the‑loop iterations, and long inference times, which are infeasible for billions of ad requests. RaG adopts an “online interest modeling + near‑line generation + latency‑aware service” architecture. Predicted D‑SIDs are cached; if both content‑ and creative‑SIDs hit the cache, the pre‑generated video is served instantly. If only content‑SIDs hit, a creative variant is generated asynchronously. If neither hits, a nearest‑neighbor fallback is used while the missing SID is queued for generation.

Experimental Results

D‑SIDs reduce semantic collision from 18.24 % to 2.62 %, improving both retrieval quality and quantization fidelity. The Instruction Model trained with 8 B parameters on 1 M samples balances effectiveness and latency. VGAs outperform a fixed‑pipeline baseline in both reasoning and reflection capabilities, as shown by extensive offline metrics and online A/B tests.

SCRL ablation studies confirm that each reward component contributes positively: removing user‑feedback reward collapses revenue gains, while omitting interest‑alignment or video‑quality rewards degrades relevance and visual quality.

Online Impact

In production, RaG delivers a +1.870 % ad‑revenue lift over the strong GRM baseline, which itself already provides a +3.526 % improvement over traditional DLRM models. The system serves over 400 million daily users, demonstrating that generative recommendation can translate into measurable commercial value.

A concrete user case: a 25‑34 year‑old female interested in fitness and low‑fat diets receives a personalized “beauty‑endorsed protein powder” ad. The pipeline predicts relevant content and creative SIDs, generates shot‑level instructions, and assembles visual, audio, and effect tracks to produce a video that matches the user’s latent interests.

RaG thus expands the boundary of recommendation systems from “finding existing content” to “creating content that fits the user’s desire,” establishing a new paradigm for large‑scale, feedback‑driven generative recommendation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Personalizationvideo generationA/B testingrecommendation systemsGenerative AILarge-Scale DeploymentSemantic IDs
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.