Databases 19 min read

How REDtao Scaled Xiaohongshu’s Social Graph to Trillions of Edges

Xiaohongshu built the REDtao graph storage system to handle a trillion‑scale social graph, replacing MySQL with a three‑layer architecture, custom graph APIs, high‑availability caches, cross‑cloud multi‑active deployment, and cloud‑native operators, achieving over 90% cache hit rate and dramatic cost savings.

dbaplus Community
dbaplus Community
dbaplus Community
How REDtao Scaled Xiaohongshu’s Social Graph to Trillions of Edges

Background

Xiaohongshu is a youth‑focused lifestyle platform whose social graph contains billions of users, notes, products and their relationships. The existing MySQL‑based storage could not keep up with the read‑heavy workload, reaching 55% CPU at only a few million requests per second and requiring costly scaling.

In early 2021 the team launched a from‑scratch project to build REDtao, a graph storage system inspired by Facebook’s TAO, to provide a unified graph query API, high performance, and lower operational cost.

Graph Model and API

Relationships are stored as <FromId, AssocType, ToId> → Value(JSON). For example, a "follow" edge from user A to user B is represented as a triple with a JSON payload.

<FromId: A_ID, AssocType: follow, ToId: B_ID> → Value (JSON fields)

Twenty‑five graph‑semantic APIs were exposed, covering CRUD operations and anti‑fraud filters. Typical usage examples include:

getAssocs("followed", userAId, offset, limit, onlyNormalUsers, orderDesc)

– fetch all normal users following A. getAssocCount("followed", userAId, onlyNormalUsers) – count A’s followers while excluding cheating accounts.

Architecture Design

REDtao follows a three‑layer design: an access layer (SDK), a distributed cache layer, and a persistent MySQL layer. The cache layer is an independent cluster, decoupled from storage, allowing independent scaling and plug‑and‑play replacement of the MySQL backend.

Social graph scale illustration
Social graph scale illustration

Read flow: The client sends a request to a router, which hashes the edge triple to a follower node. The follower checks the local cache; on miss it forwards to the leader, which may query MySQL if the cache also misses.

Write flow: Writes follow the same routing to a follower, then to the leader, which writes to MySQL, invalidates the corresponding cache key, and propagates the invalidation to all followers.

High Availability

Both cache and storage layers are built as independent two‑tier clusters with leader/follower replication. Automatic fault detection, horizontal scaling, and cache‑only operation during storage failures ensure continuous service.

Rate‑limiting protects MySQL from cache‑miss storms, and a global version number per write prevents write‑conflict anomalies.

Performance

REDtao uses a three‑level nested hash table (from‑id → type → to‑id) with local secondary indexes and a time‑ordered list limited to the newest 1,000 edges per point, achieving high cache hit rates and low latency.

In production a 16‑core VM handles 1.5 million queries per second with only 22.5% CPU usage; a single node reaches 30 k QPS, each RPC aggregating ~50 queries.

QPS and CPU utilization chart
QPS and CPU utilization chart

Ease of Use

All 25 APIs abstract away SQL, providing a consistent programming model. A unified access URL hides the underlying cluster topology; the SDK routes requests based on edge type to the appropriate REDtao cluster.

Unified access URL diagram
Unified access URL diagram

Data Consistency

Writes generate a globally increasing version; cache updates compare versions to avoid stale overwrites. For strong‑consistency reads, clients can flag requests to be routed to the MySQL master.

Cross‑Cloud Multi‑Active

REDtao replicates MySQL binlogs across clouds for persistence and uses a DTS‑based subscription to invalidate caches, ensuring eventual consistency while allowing reads from any region.

Cross‑cloud multi‑active architecture
Cross‑cloud multi‑active architecture

Cloud‑Native Features

REDtao runs on Kubernetes with a custom Operator that creates a DuplicateSet resource to control shard placement, supports rolling upgrades, and automatically replaces failed pods.

Kubernetes Operator diagram
Kubernetes Operator diagram

Seamless Migration from Legacy MySQL

Migration was performed in stages: low‑priority services moved first, using a Tao Proxy SDK for dual‑write/dual‑read and data validation. After DTS‑based incremental sync, the SDK switched to read‑only from REDtao, and final consistency checks were run on binlogs.

The migration completed in early 2022 without downtime, moving trillions of edges and achieving a 21.3% cost reduction.

Results and Benefits

Post‑deployment, cache hit rate exceeds 90%, MySQL QPS drops by over 70%, and CPU usage falls dramatically. Overall infrastructure cost grew only ~15% while handling a 2.5× increase in request volume.

Cost and performance improvement chart
Cost and performance improvement chart

Conclusion and Outlook

REDtao demonstrates that a purpose‑built graph storage system can replace a massive MySQL deployment, delivering high performance, high availability, and cloud‑native operability. Future work includes merging REDgraph with REDtao into a unified graph database to serve broader internal scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Graph Databasesocial graph
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.