Artificial Intelligence 8 min read

How GATNE Advances Heterogeneous Graph Embedding with Edge Types and Node Features

This article introduces GATNE, a graph embedding framework that jointly models heterogeneous nodes, multiple edge types, and rich node attributes using base and edge embeddings, self‑attention, and inductive learning, and demonstrates its superior performance on several real‑world datasets.

Alibaba Cloud Developer

Aug 9, 2019

How GATNE Advances Heterogeneous Graph Embedding with Edge Types and Node Features

Background

Traditional graph embedding methods focus on homogeneous graphs, but many real‑world graphs are heterogeneous, containing different node types and multiple edge types. Prior work such as metapath2vec and MNE addressed heterogeneous nodes or edges separately.

Problem Setting

In Alibaba e‑commerce, the user‑item graph includes heterogeneous nodes (users, items) and heterogeneous edges (click, collect, add‑to‑cart, purchase), along with rich node attributes (e.g., user gender, age; item price, category). Graphs are classified into six categories: HON, AHON, HEN, AHEN, MHEN, AMHEN.

Model Overview

GATNE learns, for each node and each edge type, an embedding composed of a base embedding (shared across edge types) and an edge embedding (computed from neighboring edge embeddings). The edge embedding is obtained similarly to GraphSAGE, aggregating neighbor information via mean or max‑pooling.

To capture interactions among different edge‑type embeddings, GATNE employs a self‑attention mechanism that assigns a weight to each edge type, producing a weighted combination of edge‑type representations.

For inductive learning (GATNE‑I), node features are transformed (e.g., via a linear layer or neural network) to generate base and edge embeddings, enabling embedding of unseen nodes.

Training Procedure

Training uses meta‑path‑based random walks and a heterogeneous skip‑gram objective. For a node and its context C, the negative log‑likelihood is minimized using a heterogeneous softmax function and negative sampling.

Experiments

Experiments on three public datasets (Amazon, YouTube, Twitter) and an Alibaba dataset (user‑item interactions) show that GATNE‑T and GATNE‑I achieve the best performance compared with baselines such as DeepWalk, MVE, and MNE. GATNE‑I excels when node features are rich, while GATNE‑T performs slightly better on datasets with weak features.

Distributed implementations of the baselines and GATNE on PAI TensorFlow demonstrate that GATNE‑I converges faster and scales efficiently with increasing worker counts.

Results

Tables (omitted) report that GATNE‑I improves metrics significantly over baselines on the large Alibaba dataset, and both GATNE variants consistently outperform baselines on the public datasets.

Scalability

Training time decreases markedly as the number of workers grows, confirming the model’s scalability for large‑scale graph data.

References

Y. Dong, N. V. Chawla, A. Swami. metapath2vec: Scalable representation learning for heterogeneous networks. KDD’17.

H. Zhang et al. Scalable Multiplex Network Embedding. IJCAI’18.

W. Hamilton, Z. Ying, J. Leskovec. Inductive representation learning on large graphs. NIPS’17.

Z. Lin et al. A structured self‑attentive sentence embedding. ICLR’17.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Self-Attention graph embedding inductive learning heterogeneous networks GATNE

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.