Databases 30 min read

How REDgraph Cut Multi‑Hop Query Latency by 50% with Distributed Parallel Execution

Xiaohongshu's REDgraph graph database faced high latency for multi‑hop queries, so the storage team redesigned the query framework using MPP‑inspired distributed parallel execution, edge‑partitioning, operator forwarding, and caching, achieving over 50% latency reduction and making three‑hop queries viable for online services.

dbaplus Community
dbaplus Community
dbaplus Community
How REDgraph Cut Multi‑Hop Query Latency by 50% with Distributed Parallel Execution

Background

Xiaohongshu relies on a graph database to model users, posts, products and their complex relationships. While one‑hop queries are fast, multi‑hop queries (especially three‑hop) often exceed the required P99 latency, limiting features such as real‑time social recommendation and fraud detection.

Problem Analysis

In social recommendation, a user may need to discover friends through two intermediate hops; in risk control, the system must trace multi‑degree interactions to flag fraudulent behavior. Traditional SQL joins or KV‑based approaches generate massive data transfers and cannot meet online latency constraints.

REDgraph Architecture

The system consists of three services:

Meta Service : manages schema, permissions, shard locations and background jobs.

Graph Service : stateless query engine handling parsing, validation, optimization, scheduling and execution.

Storage Service : a three‑layer storage stack – a graph‑semantic API that translates graph operations to KV requests, a Raft‑based consensus layer for strong consistency, and RocksDB as the underlying engine.

REDgraph adopts an edge‑partitioning scheme: each vertex’s ID is hashed to a shard, and each edge is stored twice, once with the source vertex and once with the destination vertex, ensuring locality for neighbor look‑ups.

Optimization Approaches

Version 1.0 introduced shortcuts for common patterns, limited fan‑out on high‑degree vertices, pushed filters/samples/limits down to storage, and enabled read‑only nodes and aggressive GC. These optimizations were effective for one‑ and two‑hop queries but lacked generality for deeper traversals.

New Distributed Parallel Framework draws on MPP concepts and restructures the execution plan:

Introduce FORWARD operators to route partial results to the appropriate storage shard.

Add CONVERGE and MERGE operators so that each shard can independently continue the plan and finally aggregate results at the driver node.

Define stages – groups of operators that can run locally on a shard without cross‑node synchronization, eliminating global barriers.

Implement a NeighborCache on storage nodes to deduplicate start‑vertex IDs between stages, reducing repeated neighbor look‑ups.

Use a lightweight Acker on the driver to track stage progress via XOR‑based checksums, ensuring correct completion detection without a central barrier.

Load‑balancing is achieved by selecting the least‑loaded replica among three storage copies and by expanding the cluster when hotspot nodes appear. Memory checks prevent cache over‑allocation and avoid OOM.

Implementation Details

The original execution flow (START → GetNeighbors → Project → InnerJoin → …) is replaced by a pipeline where the driver sends the whole plan to storage nodes, each node executes its local stage, forwards results with FORWARD, and finally the driver aggregates with MERGE. This removes the “wait‑for‑all‑responses” barrier and cuts unnecessary data shuffling.

To handle hotspots, the system caches neighbor lists for a few seconds; cache entries are invalidated on updates and are skipped when node memory usage is high.

Performance Evaluation

Tests used the LDBC SNB SF100 dataset (≈300 M vertices, 1.8 B edges). Results:

One‑ and two‑hop latency unchanged (no negative impact).

Three‑hop latency reduced by 50‑60 %; max‑degree queries stay under 50 ms, limited queries under 200 ms.

Four‑hop queries remain in the second‑to‑tens‑of‑seconds range, but still see 50‑70 % improvement.

These numbers meet the online service P99 targets for social recommendation and risk control.

Conclusion

The distributed parallel query framework dramatically improves multi‑hop traversal performance in REDgraph, achieving >50 % latency reduction while preserving compatibility with existing business logic. The approach is framework‑level and can be applied to other storage products such as REDtable, offering a path toward high‑performance OLTP‑style graph queries.

REDgraph usage scenarios
REDgraph usage scenarios
Distributed execution flow
Distributed execution flow
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

optimizationREDgraphDistributed Querymulti-hopgraph-database
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.