How OneRec Revolutionizes Short-Video Recommendations with End-to-End Generative AI
OneRec, an end-to-end generative recommendation system from Kuaishou, uses an encoder-decoder architecture, reward-based preference alignment, and reinforcement learning to dramatically improve video recommendation efficiency, boosting user engagement and reducing operational costs while achieving scaling-law performance comparable to large language models.
Overview
Recently, Kuaishou's recommendation model team introduced OneRec, an end-to-end generative recommendation system that adopts an encoder‑decoder framework and a reward‑driven preference alignment method. Reinforcement learning enhances the model, allowing it to generate video content that directly matches user preferences. OneRec has been deployed on both Kuaishou and Kuaishou Lite, handling 25% of online traffic and increasing app dwell time by 0.54% (main app) and 1.24% (Lite).
Key Contributions
Single‑stage encoder‑decoder generation framework : The encoder compresses the full‑lifecycle user behavior sequence for precise interest modeling, while a Mixture‑of‑Experts (MoE) decoder provides massive parameter scalability.
Reward‑based preference alignment : A multi‑dimensional reward system (preference, format, industrial) guides the model via reinforcement learning, enabling fine‑grained capture of user preferences.
First industrial‑grade end‑to‑end generative recommendation deployment : In a week‑long A/B test covering 5% of traffic, the pure generative model achieved performance comparable to the traditional cascade system, and with reward‑model selection it further improved dwell time (+0.54% / +1.24%) and 7‑day user lifetime (LT7) (+0.05% / +0.08%).
System Architecture
OneRec treats recommendation as a sequence generation task. The encoder processes user static features, short‑term and lifelong behavior sequences, and multimodal video signals (title, tags, ASR, visual embeddings). The decoder generates token sequences that map to video IDs. The architecture leverages flash‑attention and shared context computation to reduce redundancy.
Semantic Tokenizer
A collaborative multimodal tokenizer fuses video title, tags, speech‑to‑text, and image recognition, then applies RQ‑Kmeans to produce three‑level semantic IDs for each video.
Reinforcement Learning Preference Alignment
The system defines three reward types: Preference Reward (aligns with user preference), Format Reward (ensures valid token formats), and Industrial Reward (covers business‑specific goals). An improved ECPO algorithm stabilizes training by clipping gradients for negative‑advantage samples.
Scaling Laws
Experiments show that increasing model parameters from 0.015B to 2.633B consistently reduces training loss, indicating that recommendation models follow the same scaling behavior observed in large language models.
Performance Optimizations
OneRec reduces the number of operators from >15,000 to ~1,200, achieving MFU (model‑floating‑point‑utilization) of 23.7% (training) and 28.6% (inference), a 3‑5× improvement over traditional models. Optimizations include request‑level batching, flash‑attention, GPU‑only embedding training (SKAI system), mixed‑precision BFloat16, and kernel fusion.
Inference Optimizations
Computation reuse: encoder computed once per request; decoder cross‑attention keys/values shared across beams; KV cache for history.
Operator‑level fusion for MoE, attention, and beam search using Float16.
Dynamic batching to maximize GPU utilization.
Online Experiment Results
In a week‑long A/B test covering 5% of traffic, OneRec with reward‑model selection increased main‑app dwell time by 0.54% and Lite dwell time by 1.24%, while LT7 grew by 0.05% (main) and 0.08% (Lite). All interaction metrics (likes, follows, comments) showed positive lifts, confirming the system’s ability to avoid the “trade‑off” effect of multi‑objective traditional pipelines. The model now serves 25% of QPS in short‑video recommendation.
In the local‑life services scenario, OneRec boosted GMV by 21.01%, order volume by 17.89%, and new‑user acquisition by 23.02% after full traffic rollout.
Conclusion and Future Directions
OneRec demonstrates that an end‑to‑end generative architecture, combined with deep system optimizations, can surpass traditional cascade recommendation pipelines in both effectiveness and efficiency. Remaining challenges include improving inference scalability, integrating multimodal user behavior with LLM/VLM paradigms, and designing a more comprehensive reward system.
Recruitment
The Kuaishou recommendation model team is hiring for positions such as Recommendation Large‑Model Algorithm Engineer, Recommendation Algorithm Engineer, and related internship roles. Interested candidates can submit resumes to the provided email addresses.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
