Artificial Intelligence 9 min read

Understanding Graph Neural Networks: Nodes, Edges, and Message Passing

This article explains the fundamentals of graph neural networks, covering graph concepts, node classification via neighborhood aggregation, message‑passing mechanics, mathematical notation, a full DGL‑PyTorch implementation on the Reddit dataset, and training results showing accuracy improvements up to 91 %.

Code DAO

Dec 25, 2021

Understanding Graph Neural Networks: Nodes, Edges, and Message Passing

The article introduces the basic concepts of graphs used in graph neural networks (GNNs): nodes represent data samples and edges represent relationships such as city distances or paper citations.

GNN Application – Node Classification

One common GNN task is node classification, which aggregates information from a reference node’s neighbors and the edges connecting them.

Network Layers

Node layer: recurrent network.

Edge layer: feed‑forward network.

Message Passing Process

During each iteration, neighboring nodes transmit their embeddings through the edge network to the recurrent node network. The reference node’s embedding is updated by applying the recurrent function to its current embedding and adding the summed outputs from the edge network. Repeating this step yields new embeddings that combine the node’s original information with aggregated neighbor information, which can then be fed to subsequent layers or pooled into a graph‑level vector H.

Mathematical Notation

x_v

: node features. x_co[v]: features of edges incident to v. h_ne[v]: embeddings of neighboring nodes. x_nv[v]: features of neighboring nodes. f: transformation mapping inputs to a d -dimensional space. H and X denote the concatenations of all h and x values during the iterative update.

Typical Sampling Process

Implementation with DGL and PyTorch

Set the DGL backend to PyTorch:

%env DGLBACKEND='pytorch'
import dgl
import torch
import torch.nn as nn
import torch.nn.functional as F
import dgl.data

Load the Reddit dataset, which contains posts from September 2014; nodes are posts and edges connect posts commented on by the same user. Use the first 20 days for training and the remaining days for testing (30 % for validation).

dataset = dgl.data.RedditDataset()
print('Number of categories:', dataset.num_classes)
g = dataset[0]
print('
Node features')
print(g.ndata.keys())
print('
Edge features')
print(g.edata.keys())
print(f"
Total nodes: {g.num_nodes():,}")
print(f"Total edges: {g.num_edges():,}")

Output shows 41 categories, 232,965 nodes, and 114,615,892 edges.

Two‑Layer GCN Model

from dgl.nn import GraphConv

class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)
    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        return h

model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)

Training Loop

def train(g, model):
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    best_val_acc = 0
    best_test_acc = 0
    features = g.ndata['feat']
    labels = g.ndata['label']
    train_mask = g.ndata['train_mask']
    val_mask = g.ndata['val_mask']
    test_mask = g.ndata['test_mask']
    for e in range(50):
        logits = model(g, features)
        pred = logits.argmax(1)
        loss = F.cross_entropy(logits[train_mask], labels[train_mask])
        train_acc = (pred[train_mask] == labels[train_mask]).float().mean()
        val_acc = (pred[val_mask] == labels[val_mask]).float().mean()
        test_acc = (pred[test_mask] == labels[test_mask]).float().mean()
        if best_val_acc < val_acc:
            best_val_acc = val_acc
            best_test_acc = test_acc
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if e % 5 == 0:
            print(f'epoch {e}, loss: {loss:.3f}, val acc: {val_acc:.3f} (best {best_val_acc:.3f}), test acc: {test_acc:.3f} (best {best_test_acc:.3f})')

if USE_GPU:
    g = g.to('cuda')
    model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes).to('cuda')
else:
    model = GCN(g.ndata['feat'].shape[1], 16, dataset.num_classes)

train(g, model)

Training logs show validation accuracy rising from 0.011 at epoch 0 to 0.912 at epoch 45, with corresponding test accuracy reaching 0.909.

Saving the Trained Graph

# Save graphs
dgl.save_graphs('graph.dgl', g)  # Load later with dgl.load_graphs('graph.dgl')
print(g)

The final printed graph object confirms the dataset dimensions and feature schemes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

PyTorch GNN message passing graph neural networks GCN DGL node classification

Written by

Code DAO

We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.