Angel Graph: A Large-Scale Graph Computing Platform by Tencent
This article introduces Tencent's Angel Graph platform, detailing its evolution from early versions to a mature large‑scale graph computing system, its architecture combining Angel PS with Spark and PyTorch, data and model partitioning strategies, communication and computation optimizations, stability features, usability, and real‑world applications.
Angel Graph is Tencent's large‑scale graph computing platform, originally open‑sourced in 2017 as version 1.0 for high‑dimensional sparse models, and evolved through versions 1.5, 3.0‑3.2 to support comprehensive graph algorithms.
From late 2018, the growing demand for high‑performance, highly‑available graph processing led to the creation of Angel Graph, which leverages Angel's parameter‑server (PS) architecture together with Spark for data preprocessing and distributed execution.
Framework Overview
The system consists of Angel PS, which stores and provides random access to graph models, and Spark, which handles ETL and distributed computation. In the Spark‑on‑Angel workflow, workers pull neighbor lists from PS, compute intersections (e.g., common neighbors), and push updates back, enabling efficient graph mining.
PyTorch on Angel
To support graph neural networks, PyTorch is integrated on Angel. PS stores model parameters while a single‑node PyTorch runtime performs forward and backward passes; users only need to define their PyTorch models without dealing with the underlying distributed shell.
Data and Model Partitioning
Data partitioning on the Spark side supports edge, vertex, hybrid, and block cuts. Model partitioning on the PS side offers both range and hash schemes to balance load and memory usage.
Stability
System‑level fault tolerance is achieved via PS checkpointing (similar to RDD cache). Algorithm‑level checkpoint strategies are chosen based on error tolerance: global checkpoints for strict graph mining, incremental checkpoints for representation learning, and hybrid approaches for GNNs.
Usability
The platform integrates tightly with the big‑data ecosystem, providing a rich library of graph algorithms (mining, representation learning, GNNs) and abstracted operators that cover data loading, preprocessing, and common graph computations.
Communication Optimizations
To reduce the number of connections, a square‑partitioner and kernel merging technique limit each data partition to communicate with only a few PS partitions. Additionally, custom PS functions move sampling logic to the PS, shrinking data transfer from edge‑level to vertex‑level size.
Computation Optimizations
Super‑vertex scattering distributes the workload of high‑degree vertices across workers, while dynamic adjacency‑list compression avoids costly shuffle operations when building neighbor lists for trillion‑edge graphs.
Applications
Internally, Angel Graph powers recommendation, risk control, social, and gaming services at Tencent, handling homogeneous/heterogeneous, directed/undirected graphs with both attribute‑rich and attribute‑free edges, and supporting supervised and unsupervised tasks.
Overall, Angel Graph offers a high‑performance, highly‑available, and easy‑to‑use platform for large‑scale graph computation.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.