How REDgraph Supercharges Query Performance for Massive Social Networks
This article explains how Xiaohongshu built the REDgraph graph database to tackle ultra‑large social network queries, compares graph databases with traditional relational databases, showcases a Gremlin example, and highlights the scalability and efficiency benefits of storing relationships as first‑class citizens.
Introduction
Xiaohongshu, a community‑centric product with massive social network data, faced challenges in social‑scenario applications and distributed parallel query implementations. To address these, they developed REDgraph, a graph database designed for ultra‑large‑scale social networks, dramatically improving query efficiency and performance.
Presentation Outline
The talk is divided into five parts: background introduction, analysis of the original architecture problems, distributed parallel query implementation, summary and outlook, and a Q&A session.
Background: Graph Database Overview
Graph databases belong to the NoSQL family. Compared with other NoSQL types—key‑value, wide‑column, document, and time‑series—their data association degree and query complexity increase from KV to graph. While KV, wide‑column, and document focus on richness within a single record, graph databases specialize in handling relationships, making them ideal for scenarios that require deep link or multi‑dimensional relationship mining.
Comparison with Relational Databases
Consider a typical social‑network schema with user, friend, like, and note tables. Retrieving the notes liked by a user's friends in a relational database requires a lengthy SQL statement with multiple JOIN operations, consuming significant CPU, memory, and I/O resources. Even with carefully designed indexes, the cost remains high and maintenance intensive.
Using a graph database simplifies the process. By modeling two vertex types (users and notes) and two edge types (friendship and likes), the data forms a clear network structure. A Gremlin query can retrieve the desired information in just four lines:
g.V().has('name','Tom')
.out('friend')
.out('like')
.values('content')This query is concise, readable, and directly reflects the underlying graph topology.
Key Advantages of Graph Databases
Graph databases store vertices and edges as first‑class citizens, enabling extremely efficient adjacency and relationship traversals. Even as data volume grows, query latency remains stable, avoiding the performance degradation typical of relational joins.
Source Note
Excerpted from the e‑book AI for Data: Intelligent Data Processing and Analysis Practice . Scan the QR code below to join the community and obtain the e‑book.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
