20 Core PyTorch Concepts to Accelerate Your AI Projects
This article walks through twenty essential PyTorch concepts—from basic Tensor creation and manipulation, through autograd and neural‑network construction, to data loading, GPU acceleration, model saving, and practical training tricks—providing concrete code examples and clear explanations for developers eager to build and deploy AI models.
01 PyTorch Basics: Tensors
PyTorch’s core data structure is the Tensor, a GPU‑optimized NumPy‑like array. Tensors can be created from Python lists or with factory functions:
import torch
# From list
x = torch.tensor([1, 2, 3])
# Specific shapes
zeros_tensor = torch.zeros(3, 2) # 3×2 zeros
ones_tensor = torch.ones(3, 2) # 3×2 ones
random_tensor = torch.rand(2, 2) # 2×2 uniform
normal_tensor = torch.randn(2, 2) # 2×2 standard normalConversion between NumPy arrays and Tensors is seamless:
import numpy as np
x_numpy = np.array([0.1, 0.2, 0.3])
x_torch = torch.from_numpy(x_numpy)
y_torch = torch.tensor([3, 4, 5])
y_numpy = y_torch.numpy()02 Tensor Operations
Reshaping methods:
view – works only on contiguous memory (requires x.is_contiguous() to be True).
reshape – safe for non‑contiguous tensors.
x = torch.randn(2, 3, 4)
print(x.is_contiguous()) # True
# view works on contiguous tensor
y = x.view(6, 4)
# transpose makes tensor non‑contiguous
x_t = x.transpose(1, 2)
print(x_t.is_contiguous()) # False
# reshape works safely on non‑contiguous tensor
y = x_t.reshape(6, 4)Dimension manipulation:
# Add a dimension
x = torch.randn(3, 4)
x_unsq = x.unsqueeze(0) # shape (1, 3, 4)
# Remove a dimension
x_sq = x_unsq.squeeze(0) # shape (3, 4)
# Transpose and custom ordering
y = x.transpose(0, 1) # swap dim 0 and 1
z = x.permute(1, 0, 2) # custom order03 Autograd – Automatic Differentiation
Setting requires_grad=True on a tensor builds a computation graph. Calling .backward() computes gradients; gradients accumulate by default, so optimizer.zero_grad() must be called before each backward pass.
# Scalar example
x = torch.tensor(2.0, requires_grad=True)
y = x**2 + 2*x + 2
y.backward()
print(x.grad) # 6.0
# Multivariate example
def g(w):
return 2*w[0]*w[1] + w[1]*torch.cos(w[0])
w = torch.tensor([3.14, 1.0], requires_grad=True)
z = g(w)
z.backward()
print(w.grad) # approx [2.0, 6.28]
# Gradient accumulation handling
optimizer.zero_grad()
loss.backward()
optimizer.step()04 Building Neural Networks
Two primary ways to define models:
Subclass nn.Module – flexible, suitable for custom forward logic.
Use nn.Sequential – concise for simple feed‑forward stacks.
# Subclassing nn.Module
import torch.nn as nn
class NeuralNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.hidden = nn.Linear(input_size, hidden_size)
self.predict = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = torch.relu(self.hidden(x))
return self.predict(x)
# Using nn.Sequential
model = nn.Sequential(
nn.Linear(10, 50),
nn.ReLU(),
nn.Linear(50, 20),
nn.ReLU(),
nn.Linear(20, 1)
)05 Core Network Components
Activation Functions
ReLU – max(0, x); simple, mitigates vanishing gradients; used in most hidden layers.
Sigmoid – 1/(1+e^{-x}); outputs in (0,1); typical for binary classification output.
Tanh – (e^{x}-e^{-x})/(e^{x}+e^{-x}); outputs in (-1,1); common in RNN hidden layers.
Leaky ReLU – max(αx, x); alleviates “dead neuron” problem; used as a ReLU replacement when needed.
Loss Functions
# Regression
mse_loss = nn.MSELoss() # mean‑squared error
mae_loss = nn.L1Loss() # mean absolute error
# Classification
ce_loss = nn.CrossEntropyLoss() # multi‑class cross‑entropy
bce_loss = nn.BCELoss() # binary cross‑entropyOptimizers
import torch.optim as optim
# Classic SGD with momentum
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
# Adaptive methods
optimizer = optim.Adam(model.parameters(), lr=0.001) # most common
optimizer = optim.AdamW(model.parameters(), lr=0.001) # Adam with weight decay
optimizer = optim.RMSprop(model.parameters(), lr=0.001)06 Training Loop
A full training pipeline combines model, loss, and optimizer, iterates over epochs, performs forward pass, loss computation, gradient zeroing, backward pass, and parameter update. Progress is printed every ten epochs.
# Setup
model = NeuralNet(input_size=10, hidden_size=20, output_size=1)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
for epoch in range(100):
outputs = model(train_data)
loss = criterion(outputs, targets)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if epoch % 10 == 0:
print(f'Epoch {epoch}, Loss: {loss.item():.4f}')07 Data Handling – Dataset & DataLoader
Custom Dataset subclasses must implement __len__ and __getitem__. DataLoader wraps a dataset to provide batching, shuffling, and multi‑process loading.
from torch.utils.data import Dataset, DataLoader
class CustomDataset(Dataset):
def __init__(self, data, labels):
self.data = data
self.labels = labels
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return self.data[idx], self.labels[idx]
dataset = CustomDataset(data, labels)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True, num_workers=4)
for batch_data, batch_labels in dataloader:
# training code ...
pass08 Special Layers & Applications
Convolutional Layers
# 2D convolution for images
conv2d = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding=1)
# 1D convolution for sequences
conv1d = nn.Conv1d(in_channels=2, out_channels=32, kernel_size=5)Recurrent Layers
# LSTM
lstm = nn.LSTM(input_size=10, hidden_size=20, num_layers=2, batch_first=True, dropout=0.2)
# GRU
gru = nn.GRU(input_size=10, hidden_size=20, num_layers=2, batch_first=True)09 Regularization
Techniques to prevent overfitting:
Dropout – randomly zeroes activations during training.
Batch Normalization – normalizes across the batch dimension (e.g., nn.BatchNorm1d(256) for fully‑connected layers).
Layer Normalization – normalizes across the feature dimension (e.g., nn.LayerNorm(256) for RNN/Transformer layers).
# Dropout example
dropout = nn.Dropout(p=0.2)
x = torch.randn(32, 100)
x_dropped = dropout(x)
# BatchNorm example
batch_norm = nn.BatchNorm1d(256)
# LayerNorm example
layer_norm = nn.LayerNorm(256)10 Model Modes – Train vs. Eval
model.train()enables dropout and batch‑norm statistics; model.eval() disables dropout and uses running statistics. In inference, wrap code with torch.no_grad() to save memory.
model.train()
# training code ...
model.eval()
with torch.no_grad():
predictions = model(test_data)11 GPU Acceleration
Check GPU availability and move model and tensors to the selected device.
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")
print(f"Current GPU: {torch.cuda.get_device_name(0)}")
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
data = data.to(device)
labels = labels.to(device)12 Model Saving & Loading
Two common approaches:
Full model – torch.save(model, 'full_model.pth'); load with torch.load. Simple but less flexible.
State dictionary – torch.save(model.state_dict(), 'model_weights.pth'); load by creating a model instance and calling load_state_dict. Recommended.
Checkpoints can store additional training state (epoch, optimizer state, loss).
# Full model
torch.save(model, 'full_model.pth')
model = torch.load('full_model.pth')
model.eval()
# State dict
torch.save(model.state_dict(), 'model_weights.pth')
model = MyNeuralNet()
model.load_state_dict(torch.load('model_weights.pth'))
model.eval()
# Checkpoint example
checkpoint = {
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss,
}
torch.save(checkpoint, 'checkpoint.pth')13 Practical Tips – Mixed Precision & Profiling
Mixed‑precision training reduces memory usage and speeds up computation using torch.cuda.amp.
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for data, target in dataloader:
optimizer.zero_grad()
with autocast():
output = model(data)
loss = criterion(output, target)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()Profiling with torch.profiler helps locate bottlenecks.
from torch.profiler import profile, record_function, ProfilerActivity
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA]) as prof:
with record_function("model_inference"):
output = model(data)
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data STUDIO
Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
