Understanding Graph Neural Networks: Nodes, Edges, and Message Passing
This article explains the fundamentals of graph neural networks, covering graph concepts, node classification via neighborhood aggregation, message‑passing mechanics, mathematical notation, a full DGL‑PyTorch implementation on the Reddit dataset, and training results showing accuracy improvements up to 91 %.
The article introduces the basic concepts of graphs used in graph neural networks (GNNs): nodes represent data samples and edges represent relationships such as city distances or paper citations.
GNN Application – Node Classification
One common GNN task is node classification, which aggregates information from a reference node’s neighbors and the edges connecting them.
Network Layers
Node layer: recurrent network.
Edge layer: feed‑forward network.
Message Passing Process
During each iteration, neighboring nodes transmit their embeddings through the edge network to the recurrent node network. The reference node’s embedding is updated by applying the recurrent function to its current embedding and adding the summed outputs from the edge network. Repeating this step yields new embeddings that combine the node’s original information with aggregated neighbor information, which can then be fed to subsequent layers or pooled into a graph‑level vector H.
Mathematical Notation
x_v: node features. x_co[v]: features of edges incident to v. h_ne[v]: embeddings of neighboring nodes. x_nv[v]: features of neighboring nodes. f: transformation mapping inputs to a d -dimensional space. H and X denote the concatenations of all h and x values during the iterative update.
Typical Sampling Process
Implementation with DGL and PyTorch
Set the DGL backend to PyTorch:
%env DGLBACKEND='pytorch'
import dgl
import torch
import torch.nn as nn
import torch.nn.functional as F
import dgl.dataLoad the Reddit dataset, which contains posts from September 2014; nodes are posts and edges connect posts commented on by the same user. Use the first 20 days for training and the remaining days for testing (30 % for validation).
dataset = dgl.data.RedditDataset()
print('Number of categories:', dataset.num_classes)
g = dataset[0]
print('
Node features')
print(g.ndata.keys())
print('
Edge features')
print(g.edata.keys())
print(f"
Total nodes: {g.num_nodes():,}")
print(f"Total edges: {g.num_edges():,}")Output shows 41 categories, 232,965 nodes, and 114,615,892 edges.
Two‑Layer GCN Model
from dgl.nn import GraphConv
class GCN(nn.Module):
def __init__(self, in_feats, h_feats, num_classes):
super(GCN, self).__init__()
self.conv1 = GraphConv(in_feats, h_feats)
self.conv2 = GraphConv(h_feats, num_classes)
def forward(self, g, in_feat):
h = self.conv1(g, in_feat)
h = F.relu(h)
h = self.conv2(g, h)
return h
model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)Training Loop
def train(g, model):
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
best_val_acc = 0
best_test_acc = 0
features = g.ndata['feat']
labels = g.ndata['label']
train_mask = g.ndata['train_mask']
val_mask = g.ndata['val_mask']
test_mask = g.ndata['test_mask']
for e in range(50):
logits = model(g, features)
pred = logits.argmax(1)
loss = F.cross_entropy(logits[train_mask], labels[train_mask])
train_acc = (pred[train_mask] == labels[train_mask]).float().mean()
val_acc = (pred[val_mask] == labels[val_mask]).float().mean()
test_acc = (pred[test_mask] == labels[test_mask]).float().mean()
if best_val_acc < val_acc:
best_val_acc = val_acc
best_test_acc = test_acc
optimizer.zero_grad()
loss.backward()
optimizer.step()
if e % 5 == 0:
print(f'epoch {e}, loss: {loss:.3f}, val acc: {val_acc:.3f} (best {best_val_acc:.3f}), test acc: {test_acc:.3f} (best {best_test_acc:.3f})')
if USE_GPU:
g = g.to('cuda')
model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes).to('cuda')
else:
model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)
train(g, model)Training logs show validation accuracy rising from 0.011 at epoch 0 to 0.912 at epoch 45, with corresponding test accuracy reaching 0.909.
Saving the Trained Graph
# Save graphs
dgl.save_graphs('graph.dgl', g) # Load later with dgl.load_graphs('graph.dgl')
print(g)The final printed graph object confirms the dataset dimensions and feature schemes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code DAO
We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
