How KGraph Enables Billion‑Scale Graph Processing for Social and E‑Commerce Recommendations
KGraph, developed by Kuaishou since late 2019, is a self‑built graph platform that supports massive social, e‑commerce, and security workloads, offering a distributed KV storage, high‑performance RPC framework, and advanced graph modeling to achieve tens of millions of QPS and low latency for real‑time recommendation and offline graph analytics.
Background
Kuaishou launched KGraph at the end of 2019 to address the growing need for efficient storage and retrieval of massive relationship data generated by its short‑video and social platform. With over 188 billion mutual‑follow relationships and many more user‑video, live‑commerce, and other interaction edges, traditional in‑memory graphs and relational databases could not meet the scale or performance requirements.
KGraph Architecture
Rapidly Evolving Graph Database Landscape
Graph databases store data as nodes and edges, enabling natural modeling for social networks, risk control, knowledge graphs, etc. Popular query languages include Gremlin, Cypher, and nGQL, but no single standard exists.
Directed Heterogeneous Property Graph Model
KGraph adopts a directed heterogeneous property graph. Each edge is stored twice (out‑edge at the source node and in‑edge at the target node). Nodes and edges carry type identifiers and unique IDs, allowing queries by (type+ID). Attributes can be attached to both nodes (e.g., age, gender) and edges (e.g., watch time).
Overall Framework
The storage layer is a self‑developed distributed key‑value system composed of three roles:
DBServer – single‑process multi‑Shard service providing read/write, supporting multi‑Region and multi‑AZ deployment.
Master – manages processes, schedules Shards, handles node joins and failures, enabling linear scaling.
KNS – routing management module.
On top of the KV store sits an SDK that defines the graph model and basic read/write APIs. The GraphServer (KGraphServer) acts as a proxy, exposing language‑specific interfaces and performing simple multi‑hop filtering. KGraphServer also integrates with big‑data platforms for offline data import and batch graph computation.
Key Problem Analysis
Addressing Social Recommendation Pain Points
KGraph was designed to overcome two major issues of prior solutions: the single‑machine memory limit of in‑memory graphs and the poor performance of relational databases for multi‑degree joins. Its distributed KV architecture breaks the memory ceiling, aiming for tens of millions of QPS per node.
DBServer Design
The persistence engine primarily uses NVMe SSDs (≈1 M IOPS) but to reach the desired ten‑million‑level QPS, KGraph adopts Persistent Memory (PMem). Two usage modes exist; KGraph uses the App Direct mode, mounting XFS on PMem and treating it as a disk. For high‑QPS workloads PMem is preferred, while SSDs serve lower‑QPS cases.
Current bottleneck: XFS introduces locking overhead during metadata updates, limiting throughput. Ongoing work explores a PMem‑native hash engine built with PMDK, storing index structures in DRAM and persisting data directly to PMem, complemented by an LRU cache and a custom log‑based binlog (logdb).
High‑Performance RPC Framework – KRPC
KRPC targets sub‑microsecond RPC latency. With a 40‑core server, the goal is 100 k operations per core, requiring ~1 µs per RPC. The stack includes a lock‑free network layer, a message layer that parses incoming packets, a logic layer handling RPC semantics, and a protocol layer supporting KRPC, Redis, and gRPC.
Super‑Node Handling
Super‑nodes (e.g., official accounts with billions of followers) demand special encoding. KGraph supports three edge‑encoding schemes:
One KV per edge – simple, supports super‑nodes but requires many I/Os for out‑edge scans.
All out‑edges in a single KV – fast scans but suffers from write amplification and cannot handle super‑nodes.
B‑Tree‑managed multiple KVs – a compromise that limits per‑KV edge count (e.g., 2 000 edges) and splits values when thresholds are exceeded, enabling efficient reads and controlled write amplification.
Benchmarks show the B‑Tree scheme achieves two orders of magnitude higher write throughput for 200 k‑degree nodes while keeping read performance comparable to the original scheme.
Performance Metrics
On a high‑performance network and storage stack, KGraph reaches a single‑node peak of 20 M QPS with millisecond‑level intra‑datacenter RPC latency. Real‑world deployments have observed >10 M QPS and ~600 µs average latency. Multi‑node deployments scale linearly across regions and AZs.
Graph‑level multi‑hop queries demonstrate dramatic speedups over relational joins: a 2‑hop query on a 1 M‑node graph (average out‑degree 50) completes in 1.5 ms, 3‑hop in 40 ms, 4‑hop in 960 ms, and 5‑hop in 8 s, whereas relational databases become unusable beyond 3‑hop.
Application Scenarios
Real‑Time Social Recommendation
KGraph powers Kuaishou’s social recommendation pipeline, handling billions of users and potentially trillions of edges. It supports real‑time queries for “people you may know” and multi‑degree friend discovery, delivering tens of millions of QPS with low latency.
E‑Commerce Recall
In e‑commerce, KGraph stores user, shop, product, live‑stream, and video entities along with complex relationships (e.g., influencer networks). It enables real‑time recall of relevant ads, product recommendations, and live‑stream promotions, achieving PB‑scale storage and million‑level QPS.
Offline Graph Computation with Spark
KGraph integrates with Nebula’s query engine and Spark, allowing batch algorithms such as k‑hop subgraph extraction or shortest‑path computation. Results can be written back to KGraph or exported to Hive for downstream analytics.
Summary and Outlook
KGraph’s main advantage lies in its performance: PMem‑backed storage and the KRPC framework deliver single‑node tens of millions of QPS with sub‑millisecond latency, freeing up resources for higher‑level recommendation optimizations. Since its launch, KGraph has been stable in social and e‑commerce recommendation, and its user and cluster size continue to grow.
Future directions include developing a native graph query engine to replace Nebula for lower latency, supporting additional storage back‑ends, and expanding into graph‑learning and knowledge‑graph use cases such as risk control and knowledge discovery.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
