Understanding PyTorch Autograd: Tensors, Gradients, and Backpropagation
This article explains PyTorch's autograd system, covering tensor creation, the requires_grad flag, detaching tensors, disabling gradient tracking with no_grad, the Function class and computational graph, and demonstrates forward and backward passes with code examples illustrating gradient computation and Jacobian‑vector products.
In PyTorch, the core of all neural networks is the autograd package, which provides automatic differentiation for operations on tensors. Autograd is a define‑by‑run framework, meaning that the backward pass is dynamically built based on the actual code execution, allowing each iteration to differ.
Tensor : The class torch.Tensor is the fundamental data structure. Setting .requires_grad = True makes the tensor track all operations applied to it; calling .backward() then automatically computes gradients, which are accumulated in the tensor's .grad attribute.
<code>x = torch.ones(2, 2, requires_grad=True) # create a tensor and enable tracking
print(x)</code>To stop a tensor from tracking history, use .detach() . For temporary disabling of gradient tracking—useful during model evaluation—wrap the code block with with torch.no_grad(): .
The Function class links tensors and functions to form an acyclic computational graph. Each tensor has a .grad_fn attribute that references the function that created it (unless the tensor was created by the user, in which case .grad_fn is None ).
Calling .backward() on a scalar tensor computes its gradient automatically. For non‑scalar tensors, a gradient argument matching the tensor’s shape must be supplied.
<code>y = x + 2 # an operation on the tensor
z = y * y * 3 # more operations
out = z.mean()
print(z, out)</code>The in‑place method .requires_grad_(...) changes the requires_grad flag of an existing tensor.
<code>a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)</code>Gradient : A gradient is the direction of steepest ascent of a function; its magnitude indicates the rate of change. Executing out.backward() (or out.backward(torch.tensor(1.)) ) computes d(out)/dx , and print(x.grad) shows the resulting gradient matrix.
Mathematically, for a vector‑valued function y = f(x) , the gradient of y with respect to x is the Jacobian matrix. torch.autograd efficiently computes Jacobian‑vector products using the chain rule.
Example of a Jacobian‑vector product:
<code>x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
y = y * 2
print(y)
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)</code>Wrapping code in with torch.no_grad(): prevents autograd from recording operations on tensors that have requires_grad=True , which is useful when you only need forward passes.
- END -
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.