Artificial Intelligence 18 min read

Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS) for Scalable and Flexible Graph Neural Networks

This talk introduces the motivation, design, theoretical analysis, and extensive experimental results of Tencent Angel Graph's Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS), which address GNN scalability and flexibility by using node‑wise adaptive propagation, attention‑based feature fusion, and a lightweight training pipeline.

DataFunTalk

Jan 17, 2022

Graph Attention Multi‑Layer Perceptron (GAMLP) and Node‑Dependent Local Smoothing (NDLS) for Scalable and Flexible Graph Neural Networks

In many real‑world scenarios data is naturally represented as graphs (social networks, bio‑medicine, material science, etc.). Traditional Graph Neural Networks (GNNs) suffer from poor scalability on large graphs and limited flexibility when node features are sparse or heterogeneous.

Motivation : Training large‑scale graphs on a single machine leads to excessive memory consumption and long runtimes; distributed training reduces memory pressure but incurs huge communication overhead, especially as the number of workers grows.

Low scalability of GNNs : Single‑machine training of billions of nodes is infeasible; distributed GraphSAGE experiments on Reddit show a large gap between observed speed‑up and the ideal linear speed‑up due to communication costs.

Low flexibility of GNNs : Existing GNNs tie the depth of message passing to the depth of MLP transformations (e.g., GCN), limiting the ability to stack many layers. Over‑smoothing and under‑smoothing arise from uniform propagation depth across nodes, and real‑world graphs often have highly imbalanced receptive fields.

Related work : Sampling‑based methods (graph‑level, layer‑level, node‑level) and model‑decoupling approaches such as SGC, SIGN, GBP, and S2GC aim to improve scalability. However, SGC only uses the final hop, SIGN concatenates multi‑hop features without adapting depth per node, and DAGNN introduces node‑adaptive message passing but requires costly embedding pulls in distributed settings.

Node‑Dependent Local Smoothing (NDLS) :

NDLS defines a Local‑Smoothing Iteration (LSI) that measures the distance between a node’s K‑step propagated feature and its infinite‑step steady‑state feature. When the distance falls below a threshold, K is taken as the optimal propagation depth for that node. This yields node‑wise adaptive depths, mitigating over‑smoothing and under‑smoothing.

The NDLS framework consists of three components: node‑dependent feature propagation, feature training (any ML model such as MLP), and node‑dependent label propagation. After propagation, features from all K hops are averaged.

Graph Attention Multi‑Layer Perceptron (GAMLP) :

GAMLP builds on NDLS and adds two attention mechanisms—Recursive Attention and JK Attention—to weight the importance of each propagation depth. Feature propagation is node‑wise; label propagation uses a Last Residual Connection to avoid label leakage. The final representation is a weighted sum of propagated features and labels, controlled by a hyper‑parameter β, and is fed into an MLP trained with cross‑entropy loss.

Experiments :

In transductive settings, GAMLP outperforms all baselines on seven datasets, including >1 % gain on large OGB benchmarks. In inductive settings, GAMLP achieves >3 % improvement over state‑of‑the‑art methods, demonstrating strong generalisation to unseen nodes.

Ablation studies confirm the effectiveness of Last Residual Connection for label propagation and the superiority of JK Attention over simpler reference vectors. Efficiency tests show GAMLP’s training time is comparable to SGC/SIGN while delivering higher accuracy, and it scales well to propagation depths of 100 without over‑smoothing.

Conclusions :

GAMLP provides deep propagation, high scalability, and computational efficiency. It can be applied to sparse graphs (e.g., recommendation systems with cold‑start users) and ultra‑large graphs containing billions of nodes, thanks to its one‑time preprocessing and decoupled training pipeline.

For more details, see the accompanying slides and figures (images retained from the original presentation).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

scalability Attention Mechanism Graph Neural Networks GAMLP NDLS Node Adaptive Propagation

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.