Big Data 14 min read

GeaFlow: Ant Group’s Real‑Time Streaming Graph Computing Engine and Its Applications

This article introduces GeaFlow, Ant Group’s self‑developed real‑time streaming graph engine, covering its basic concepts, technical architecture, dynamic and fusion computing capabilities, integrated DSL, and several large‑scale business use cases such as fraud detection and incremental community mining.

DataFunSummit
DataFunSummit
DataFunSummit
GeaFlow: Ant Group’s Real‑Time Streaming Graph Computing Engine and Its Applications

GeaFlow is Ant Group’s internally built real‑time graph computing system that supports streaming graph fusion, dynamic temporal graph calculations, and graph simulation.

The engine starts with a basic introduction to graph theory, emphasizing the advantages of graph structures over traditional table‑based models for representing complex relationships in finance and social networks.

GeaFlow’s technical architecture consists of a distributed execution engine (Ray) at the bottom, a unified graph store, a task‑based dynamic graph computation framework, a unified execution plan that merges SQL and Gremlin into a single DAG, and a hybrid DSL that allows users to write business logic using both languages.

Dynamic computation enables the system to handle evolving graphs by launching sub‑DAGs on‑the‑fly, avoiding the high latency and storage overhead of static DAG approaches used in traditional batch systems.

Fusion computing combines streaming and graph processing, allowing scenarios such as real‑time anti‑money‑laundering checks where preliminary table‑based statistics trigger sub‑graph matching only when certain conditions are met.

Distributed Gremlin compiles Gremlin scripts into distributed graph tasks that run on the underlying execution engine, providing efficient graph traversal and computation at scale.

The integrated DSL offers a one‑stop development experience: users can construct end‑to‑end pipelines with SQL, trigger sub‑graph matching or shortest‑path analysis on incoming streams, and write results back to tables, bridging the gap between table‑centric big‑data workflows and graph‑centric analytics.

Application practice showcases several real‑world use cases, including real‑time gang (fraud ring) detection, incremental community mining, and temporal graph calculations that achieve second‑level latency on billions of vertices and edges, dramatically improving development efficiency.

In summary, GeaFlow provides a complete real‑time graph computing stack—DSL development, distributed Gremlin, streaming‑graph fusion, temporal graph computation, simulation, and exploration—supporting over 300 internal scenarios ranging from risk control to knowledge graphs and graph‑based AI.

Big DataGraph ComputingAnt GroupStreaming GraphGeaFlowreal-time graph
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.