Streaming Graph Processing in Ant Group: Real-Time Data Architecture and Applications
This article presents Ant Group's comprehensive real-time data framework and streaming graph processing engine, detailing its architecture, unified batch‑stream capabilities, and practical applications such as traffic attribution, real‑time OLAP, and user‑behavior intent analysis, while outlining future directions.
In the big data domain, streaming graph processing combines graph computation and stream processing to handle real‑time data flows, analyzing relationships between vertices and edges for continuous insight.
Ant Group has a mature real‑time data system; this talk covers its overall real‑time data architecture, key technologies, and applications of streaming graph processing in traffic attribution, real‑time OLAP, and user‑behavior intent analysis.
Agenda:
Ant Group real‑time data overview
Streaming graph processing in traffic attribution
Exploration of streaming graph processing in real‑time OLAP
Exploration of streaming graph processing in user‑behavior intent analysis
Future outlook
Ant Real‑Time Data Overview
The framework consists of three layers: foundational technologies (compute, storage, messaging), real‑time core capabilities (architecture & development paradigm, data assets, solutions), and business solutions (marketing, risk control, etc.). The streaming graph engine resides in the foundational layer.
Unified Batch‑Stream (Flow‑Batch Integration)
This paradigm allows a single codebase to process both real‑time streams and batch data, reducing development effort. Engines like Apache Flink and Spark support this capability.
Ant Streaming Graph System (TuGraph‑Analytics) Architecture
The system comprises container resources (Kubernetes, Ray), the streaming graph engine (GraphView API, Unified Graph Engine, Graph State), and data applications (traffic attribution, real‑time OLAP, intent analysis). It aims to provide an integrated solution for efficient real‑time data processing.
Application: Real‑Time Traffic Attribution
The traffic conversion funnel moves users from public to private domains, then to transaction conversion, enabling commercial value extraction. The attribution model defines nodes such as path start, cut points, effective/ineffective conversion nodes, and path end, producing trimmed conversion chains.
The technical stack includes real‑time data collection (client and server), streaming graph construction, and attribution path calculation, with results output to downstream MQ and OLAP.
Application: Real‑Time OLAP
Three computation modes are discussed: pre‑computation, pre‑wide‑post‑aggregate, and post‑aggregation. Post‑aggregation (post‑computation) enables flexible, fault‑tolerant analytics without upfront processing, improving feature development efficiency.
In marketing scenarios, post‑computation reduces wasted pre‑processing and supports on‑demand analysis, offering higher flexibility and lower latency for ad‑hoc queries.
Application: Real‑Time User‑Behavior Intent Analysis
By constructing real‑time graphs of user actions in financial services, the system assigns intent scores to nodes, identifying likely product interests and enabling targeted marketing.
Compared to traditional recommendation algorithms, this approach offers higher timeliness, white‑box transparency, noise reduction, and efficient post‑computation.
Future Outlook
Promote real‑time OLAP in marketing scenarios
Expand real‑time intent analysis to finance and content domains
Explore real‑time attribution for ad‑link diagnostics
Contribute to open‑source streaming graph projects
Thank you for attending.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.