How Alibaba’s DGS Enables Real‑Time GNN Inference on Massive Dynamic Graphs

The Dynamic Graph Sampling (DGS) service, built on GraphLearn, delivers sub‑20 ms latency for real‑time GNN inference on large, constantly evolving graphs by separating storage from computation, using event‑driven pre‑sampling, lazy multi‑hop concatenation, and a publish‑subscribe architecture that scales linearly across distributed workers.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How Alibaba’s DGS Enables Real‑Time GNN Inference on Massive Dynamic Graphs

Background

GraphLearn, co‑developed by Alibaba Cloud PAI and DAMO Academy, is a large‑scale graph neural network (GNN) training framework and the learning engine of the GraphScope platform. Its newly open‑sourced Dynamic Graph Sampling (DGS) service provides real‑time online inference for GNNs on dynamic graphs, handling high‑throughput graph updates while guaranteeing low latency and high concurrency. The related paper won the EuroSys 2023 best poster award.

Motivation and Requirements

GNN models capture high‑order neighbor information via graph structures. In industrial settings such as recommendation and fraud detection, graph topology and attributes evolve over time, requiring the model to sample and represent dynamic neighborhoods in real time. To meet user‑experience demands, inference must complete within a few milliseconds (P99 < 20 ms) even when the graph size and query per second (QPS) exceed a single machine’s capacity, and the system must scale linearly in a distributed environment.

Challenges

Neighbor sampling must traverse all neighbors; dynamic changes make latency unstable.

Uneven graph partitioning leads to load imbalance across shards.

Multi‑hop sampling and attribute collection incur significant network and I/O overhead on distributed graphs.

Key Design Principles

Storage‑Compute Separation & Query‑Aware Cache

DGS separates graph storage from sampling computation. Sampling (random, top‑k timestamp, or probability‑based) is performed on the compute side, while frequently accessed query data are cached to improve spatial locality.

Event‑Driven Pre‑Sampling

Sampling for each vertex is triggered by graph‑update events rather than inference requests. Using a weighted reservoir sampling algorithm, DGS performs stream sampling at update time, reducing request‑time work to a simple point lookup. This makes per‑vertex sampling cost O(K), where K is the reservoir size.

Multi‑Hop Decomposition & Lazy Concatenation

Fixed‑pattern queries (e.g., a two‑hop user‑item GraphSAGE pattern) are decomposed into per‑hop sub‑queries. Each hop is pre‑sampled and stored; the final multi‑hop result is assembled lazily only when an inference request arrives, avoiding continuous updates of already‑joined samples.

Figure 1: Example two‑hop sampling query expressed in Graph Sampling Language (GSL).

Figure 2: Decomposition of the two‑hop query.

Figure 3: Event‑driven update of the pre‑sampled reservoir.

Publish‑Subscribe Mechanism & Read‑Write Isolation

To avoid costly cross‑shard communication during multi‑hop assembly, DGS routes request IDs to specific serving workers that subscribe to updates of the relevant vertices. Updates to one‑hop samples trigger messages that refresh the subscription tables of the affected serving workers. Additionally, read tasks (sampling queries) and write tasks (graph updates) are scheduled on separate machines, giving priority to read latency while controlling write staleness.

System Architecture

The core components are Sampling Workers and Serving Workers. Graph updates are partitioned by vertex key and sent to the appropriate Sampling Worker, which performs one‑hop pre‑sampling and forwards results to the corresponding Serving Worker. Serving Workers cache K‑hop results locally, enabling fast inference without remote reads. Both worker types can elastically scale.

Figure 4: DGS core architecture.

Performance Evaluation

Experiments on Alibaba’s e‑commerce dataset show that DGS keeps the P99 latency of two‑hop random sampling queries under 20 ms, with a single Serving Worker handling roughly 20 k QPS and linear scalability as workers are added. Graph update throughput reaches 109 MB/s, also scaling linearly.

Figure 5: Experimental configuration and performance results.

Conclusion

DGS provides a complete solution for real‑time GNN inference on massive dynamic graphs, featuring storage‑compute separation, event‑driven pre‑sampling, lazy multi‑hop assembly, and a publish‑subscribe model that together achieve sub‑20 ms latency and linear scalability. The service includes additional modules for high availability, data loading, and model integration, and is available as open‑source with tutorials and documentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed Systemsgraph neural networksAlibaba CloudLow latencyReal-time inferencedynamic graph samplingGraphLearn
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.