Artificial Intelligence 16 min read

Kuaishou’s Practices for Large‑Scale Model Data Processing and Storage

This article shares Kuaishou’s real‑time, massive‑scale model data processing pipeline, covering model scenarios, recommendation workflow complexity, large‑scale data storage, streaming joins, feature computation, NVM‑based storage solutions, strong consistency mechanisms, and future outlook for AI recommendation systems.

DataFunSummit
DataFunSummit
DataFunSummit
Kuaishou’s Practices for Large‑Scale Model Data Processing and Storage

The presentation introduces Kuaishou’s large‑scale model data processing practice, originally shared by Wang Jing in November 2023, emphasizing the need for both massive volume and real‑time processing.

1. Model Scenario Kuaishou serves billions of daily active users with real‑time recommendation of short videos, requiring ultra‑large models (up to 1.9 trillion parameters) and handling over 30 TB of state per second during peak traffic.

2. Recommendation Business Complexity The recommendation pipeline consists of recall, coarse ranking, fine ranking, re‑ranking, and final result selection, with both large‑scale (hundreds of billions of samples) and mid‑scale (tens of billions) business types. Large‑scale tasks favor online streaming iteration, while mid‑scale tasks use batch iteration to accumulate longer historical data.

3. Model Data Scale Kuaishou’s models far exceed GPT‑3 in parameter count, using SIM long‑sequence models that require storing billions of user interest sequences, leading to petabyte‑level storage demands.

4. Language Model Evolution The evolution from RNN to Transformer to encoder‑decoder and decoder‑only architectures is outlined, showing how recommendation models inherit advances from language models.

5. Large‑Scale Model Data Processing Real‑time requirements demand sub‑second latency from user behavior capture to model update. Kuaishou adopts Flink for streaming but mitigates state‑join bottlenecks by offloading state to high‑performance storage and using stateless hash joins.

6. Complex Feature Computation Feature processing combines scalar CPU calculations with vectorized GPU operations via a DSL written in Python that invokes high‑performance C++/CUDA operators within a Flink‑based runtime.

7. Storage Characteristics Online storage must provide ultra‑low latency (≤10 ms) and high throughput for millions of QPS, leading to a predominantly in‑memory architecture.

8. NVM Table Storage Solution A three‑layer heterogeneous storage (NVM, memory pool, API) enables zero‑copy access, LRU/LFU eviction, and leverages Intel NVM for near‑memory speed with SSD‑like capacity, achieving 120× faster recovery and matching network bandwidth limits.

9. Strong Consistency To support advertising and e‑commerce recommendations, Kuaishou employs Raft‑based replication with a balanced binary‑tree distribution to guarantee sub‑10‑second state propagation across thousands of nodes.

10. Outlook Future challenges include exponential growth of model parameters and compute, especially for generative token‑based recommendation. Kuaishou plans to adopt new hardware (CXL, NVM, Grace) and engineering optimizations (state diffs, off‑load) to sustain scalability.

Overall, the talk outlines how Kuaishou integrates AI, big‑data, and cloud‑native techniques to build a resilient, high‑performance recommendation infrastructure.

big dataAIRecommendation systemsLarge-Scale Modelsreal-time data processingKuaishouNVM storage
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.