Databases 6 min read

How REDgraph Supercharges Query Performance for Massive Social Networks

This article explains how Xiaohongshu built the REDgraph graph database to tackle ultra‑large social network queries, compares graph databases with traditional relational databases, showcases a Gremlin example, and highlights the scalability and efficiency benefits of storing relationships as first‑class citizens.

DataFunSummit

Nov 5, 2025

How REDgraph Supercharges Query Performance for Massive Social Networks

Introduction

Xiaohongshu, a community‑centric product with massive social network data, faced challenges in social‑scenario applications and distributed parallel query implementations. To address these, they developed REDgraph, a graph database designed for ultra‑large‑scale social networks, dramatically improving query efficiency and performance.

Presentation Outline

The talk is divided into five parts: background introduction, analysis of the original architecture problems, distributed parallel query implementation, summary and outlook, and a Q&A session.

Background: Graph Database Overview

Graph databases belong to the NoSQL family. Compared with other NoSQL types—key‑value, wide‑column, document, and time‑series—their data association degree and query complexity increase from KV to graph. While KV, wide‑column, and document focus on richness within a single record, graph databases specialize in handling relationships, making them ideal for scenarios that require deep link or multi‑dimensional relationship mining.

Comparison with Relational Databases

Consider a typical social‑network schema with user, friend, like, and note tables. Retrieving the notes liked by a user's friends in a relational database requires a lengthy SQL statement with multiple JOIN operations, consuming significant CPU, memory, and I/O resources. Even with carefully designed indexes, the cost remains high and maintenance intensive.

Using a graph database simplifies the process. By modeling two vertex types (users and notes) and two edge types (friendship and likes), the data forms a clear network structure. A Gremlin query can retrieve the desired information in just four lines:

g.V().has('name','Tom')
  .out('friend')
  .out('like')
  .values('content')

This query is concise, readable, and directly reflects the underlying graph topology.

Key Advantages of Graph Databases

Graph databases store vertices and edges as first‑class citizens, enabling extremely efficient adjacency and relationship traversals. Even as data volume grows, query latency remains stable, avoiding the performance degradation typical of relational joins.

Source Note

Excerpted from the e‑book AI for Data: Intelligent Data Processing and Analysis Practice . Scan the QR code below to join the community and obtain the e‑book.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

graph database NoSQL REDgraph Distributed Query Gremlin social network

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.