How REDtao Powers Xiaohongshu’s Trillion‑Edge Social Graph: Architecture, Performance, and Lessons
This article details the design and implementation of REDtao, a self‑built graph storage system for Xiaohongshu that replaces MySQL with a three‑layer architecture, distributed cache, cross‑cloud multi‑active support, and delivers trillion‑edge scale, 150 M QPS, 90% cache hit rate, and significant cost reductions.
Xiaohongshu’s social platform stores a trillion‑edge graph of users, notes, and products, generating massive read traffic that strained a MySQL‑based solution (55% CPU at 1 M QPS). To handle rapid growth and reduce cost, the team built REDtao, a custom graph storage system inspired by Facebook’s TAO.
Design Inspiration and Data Model
The team surveyed industry solutions (Facebook TAO, Pinterest, ByteGraph, LinkedIn KV‑based services) and chose a TAO‑like design because their existing MySQL + Memcache stack matched its assumptions, minimizing risk.
REDtao stores each edge as a <FromId, AssocType, ToId> -> Value(JSON) triple. For example, a "follow" relationship from user A to user B is represented as:
<FromId: A_ID, AssocType: follow, ToId: B_ID> -> {"timestamp":..., "properties":...}Graph Semantic API
Twenty‑five high‑level APIs were wrapped for business teams, abstracting CRUD operations and standardising usage. Typical calls include:
getAssocs("follow", userA_ID, offset, limit, onlyNormalUsers, orderDesc)– fetch all normal users following A. getAssocCount("follow", userA_ID, onlyNormalUsers) – count A’s followers while filtering out cheating accounts.
Three‑Layer Architecture
REDtao is split into an access layer, a distributed cache layer, and a persistent MySQL layer. Clients call the REDtao SDK, which sends RPCs to a router. The router hashes the edge key to a follower node; the follower checks its local cache, forwards to the leader if missed, and the leader finally queries MySQL when needed.
Write flow mirrors the read flow: the router forwards writes to a follower, which forwards to the leader; the leader writes to MySQL, then invalidates the corresponding cache keys on all followers to guarantee eventual consistency.
High Availability
The cache layer is a two‑level distributed cluster where each shard has one leader and multiple followers. Reads can be scaled by adding followers, while writes go through a single leader to simplify consistency. Automatic failure detection swaps faulty replicas within seconds.
MySQL clusters use a self‑developed middleware for sharding and horizontal scaling, with multiple replicas for HA.
Rate‑Limiting and Queue Protection
To prevent cache‑stampede, each MySQL node caps concurrent requests; excess requests are queued or rejected. A per‑edge REDtaoQueue serialises writes or point‑lookups on the same edge, limiting abusive or bot‑driven traffic.
Performance Results
In production, a single 16‑core VM handles 1.5 M queries per second (≈30 K QPS per RPC) with only 22.5% CPU usage. Cache hit rate exceeds 90%, cutting MySQL QPS by over 70% and reducing CPU load dramatically. After shrinking MySQL replicas, overall cost dropped 21.3%.
Ease of Use
All business services now consume the unified REDtao API via a single URL, hiding the underlying cluster topology. The SDK automatically routes requests based on edge type, and configuration changes are propagated through a central config service.
Consistency Guarantees
Each write gets a globally increasing version; stale updates are rejected.
Read‑after‑write consistency is achieved by routing all requests for the same FromId to the same cache node.
Leader failures trigger asynchronous invalidation broadcasts; if a broadcast is lost, all followers clear the affected keys.
Strong‑consistency reads are supported by marking requests with a special flag that forces routing to the MySQL primary.
Cross‑Cloud Multi‑Active
Persistent data is replicated across clouds using MySQL’s native binlog replication; the cache layer receives invalidate messages via a DTS subscription that converts binlog events into cache‑clear commands. A delayed subscription ensures the slowest binlog source is used, avoiding conflicts.
Cloud‑Native Deployment
REDtao runs on Kubernetes with a custom Operator that creates a DuplicateSet resource to manage shard‑aware pod placement. The Operator handles scaling, rolling upgrades, and automatic pod replacement, enabling seamless multi‑AZ and multi‑region deployments.
Migration Strategy
Migration was performed in phases: low‑priority services were moved to a REDtao cluster first, using a dual‑write proxy SDK that wrote to both MySQL and REDtao while validating data via binlog diff checks. After confirming zero diffs, the proxy was switched to read‑only from REDtao. The entire migration completed by early 2022 with no downtime.
Business Impact
Post‑migration, read traffic (over 90% of total) enjoys >90% cache hits, MySQL QPS is reduced by >70%, and cost growth for a 2.5× request increase is only 14.7% versus a 100% increase if the old MySQL architecture were retained. The system now supports three internal graph products: REDtao (one‑hop), REDgraph (multi‑hop), and REDkv (key‑value for recommendation).
Overall, REDtao demonstrates how a tailored graph storage engine can achieve trillion‑scale performance, high availability, and cloud‑native operability while dramatically lowering operational costs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
