How GATNE Advances Heterogeneous Graph Embedding with Edge Types and Node Features
This article introduces GATNE, a graph embedding framework that jointly models heterogeneous nodes, multiple edge types, and rich node attributes using base and edge embeddings, self‑attention, and inductive learning, and demonstrates its superior performance on several real‑world datasets.
Background
Traditional graph embedding methods focus on homogeneous graphs, but many real‑world graphs are heterogeneous, containing different node types and multiple edge types. Prior work such as metapath2vec and MNE addressed heterogeneous nodes or edges separately.
Problem Setting
In Alibaba e‑commerce, the user‑item graph includes heterogeneous nodes (users, items) and heterogeneous edges (click, collect, add‑to‑cart, purchase), along with rich node attributes (e.g., user gender, age; item price, category). Graphs are classified into six categories: HON, AHON, HEN, AHEN, MHEN, AMHEN.
Model Overview
GATNE learns, for each node and each edge type, an embedding composed of a base embedding (shared across edge types) and an edge embedding (computed from neighboring edge embeddings). The edge embedding is obtained similarly to GraphSAGE, aggregating neighbor information via mean or max‑pooling.
To capture interactions among different edge‑type embeddings, GATNE employs a self‑attention mechanism that assigns a weight to each edge type, producing a weighted combination of edge‑type representations.
For inductive learning (GATNE‑I), node features are transformed (e.g., via a linear layer or neural network) to generate base and edge embeddings, enabling embedding of unseen nodes.
Training Procedure
Training uses meta‑path‑based random walks and a heterogeneous skip‑gram objective. For a node and its context C, the negative log‑likelihood is minimized using a heterogeneous softmax function and negative sampling.
Experiments
Experiments on three public datasets (Amazon, YouTube, Twitter) and an Alibaba dataset (user‑item interactions) show that GATNE‑T and GATNE‑I achieve the best performance compared with baselines such as DeepWalk, MVE, and MNE. GATNE‑I excels when node features are rich, while GATNE‑T performs slightly better on datasets with weak features.
Distributed implementations of the baselines and GATNE on PAI TensorFlow demonstrate that GATNE‑I converges faster and scales efficiently with increasing worker counts.
Results
Tables (omitted) report that GATNE‑I improves metrics significantly over baselines on the large Alibaba dataset, and both GATNE variants consistently outperform baselines on the public datasets.
Scalability
Training time decreases markedly as the number of workers grows, confirming the model’s scalability for large‑scale graph data.
References
Y. Dong, N. V. Chawla, A. Swami. metapath2vec: Scalable representation learning for heterogeneous networks. KDD’17.
H. Zhang et al. Scalable Multiplex Network Embedding. IJCAI’18.
W. Hamilton, Z. Ying, J. Leskovec. Inductive representation learning on large graphs. NIPS’17.
Z. Lin et al. A structured self‑attentive sentence embedding. ICLR’17.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
