Dynamic Learning Rate Adjustment in PyTorch: Optimizer Basics and Scheduler Usage
This article explains how to configure and use PyTorch optimizers, their attributes and methods, and demonstrates various learning‑rate scheduling techniques—including manual updates and built‑in schedulers such as LambdaLR, StepLR, MultiStepLR, ExponentialLR, CosineAnnealingLR, and ReduceLROnPlateau—through clear code examples.
Learning rate is crucial for training neural networks; starting with a larger rate speeds up learning, then gradually decreasing it helps find the optimum. In PyTorch, dynamic learning‑rate adjustment can be performed using optimizers and schedulers.
Optimizer Basics
Typical training steps are:
<code>loss.backward()<br/>optimizer.step()<br/>optimizer.zero_grad()<br/>...</code>loss.backward() computes gradients, optimizer.step() updates parameters, and optimizer.zero_grad() clears gradients for the next iteration. Common optimizers reside in torch.optim and are imported as:
<code>import torch.optim.Adam<br/>import torch.optim.SGD</code>A simple network example:
<code>class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.layer = nn.Linear(10, 2)
self.layer2 = nn.Linear(2, 10)
def forward(self, input):
return self.layer(input)</code>Optimizer Core Attributes
lr : learning rate
eps : minimum learning rate
weight_decay : L2 regularization coefficient
betas : (to be studied)
amsgrad : (bool, to be studied)
Each optimizer maintains a param_groups list that stores parameters and their specific settings.
Optimizer Core Methods
add_param_group(param_group) : add a new parameter group (useful for fine‑tuning)
load_state_dict(state_dict) : load saved optimizer state
state_dict() : return a dict containing state and param_groups
step(closure) : perform a parameter update
zero_grad() : clear gradients of all parameters
Creating an optimizer is straightforward:
<code>model = Net()
optimizer_Adam = torch.optim.Adam(model.parameters(), lr=0.1)</code>model.parameters() returns all model parameters, which are passed to the optimizer with a specified learning rate.
Training Only Part of a Model
<code>model = Net()
optimizer_Adam = torch.optim.Adam(model.layer.parameters(), lr=0.1) # only updates layer</code>Setting Different Learning Rates for Different Parts
<code>params_dict = [
{'params': model.layer.parameters(), 'lr': 0.1},
{'params': model.layer2.parameters(), 'lr': 0.2}
]
optimizer_Adam = torch.optim.Adam(params_dict)</code>Manual learning‑rate modification during training can be done by iterating over optimizer.param_groups and adjusting the 'lr' entry:
<code>lr_list = []
for epoch in range(100):
if epoch % 5 == 0:
for params in optimizer_Adam.param_groups:
params['lr'] *= 0.9
lr_list.append(optimizer_Adam.state_dict()['param_groups'][0]['lr'])
plt.plot(range(100), lr_list, color='r')
plt.show()</code>Learning‑Rate Schedulers (torch.optim.lr_scheduler)
The package provides several scheduler classes:
LambdaLR
StepLR
MultiStepLR
ExponentialLR
CosineAnnealingLR
ReduceLROnPlateau
Note: After creating a scheduler, the first step() is executed automatically (PyTorch ≥ 1.1.0), so the typical order is loss.backward() → optimizer.step() → scheduler.step() .
LambdaLR
<code>torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)</code>lr_lambda is a function (or list of functions) that receives the epoch index and returns a scaling factor α; the new learning rate is initial_lr * α .
StepLR
<code>scheduler = torch.optim.lr_scheduler.StepLR(optimizer_Adam, step_size=5, gamma=0.5, last_epoch=-1)
for epoch in range(100):
scheduler.step()
lr_list_1.append(optimizer_Adam.state_dict()['param_groups'][0]['lr'])
plt.plot(range(100), lr_list_1, color='r', label='lr')
plt.legend()
plt.show()</code>MultiStepLR
<code>scheduler = torch.optim.lr_scheduler.MultiStepLR(
optimizer_Adam,
milestones=[20, 40, 60, 80],
gamma=0.5,
last_epoch=-1)
for epoch in range(100):
scheduler.step()
lr_list_1.append(optimizer_Adam.state_dict()['param_groups'][0]['lr'])
plt.plot(range(100), lr_list_1, color='r', label='lr')
plt.legend()
plt.show()</code>ExponentialLR
<code>scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer_Adam, gamma=0.9, last_epoch=-1)
for epoch in range(100):
scheduler.step()
lr_list_1.append(optimizer_Adam.state_dict()['param_groups'][0]['lr'])
plt.plot(range(100), lr_list_1, color='r', label='lr')
plt.legend()
plt.show()</code>CosineAnnealingLR
<code>scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
optimizer_Adam, T_max=25, eta_min=0, last_epoch=-1)
for epoch in range(100):
scheduler.step()
lr_list_1.append(optimizer_Adam.state_dict()['param_groups'][0]['lr'])
plt.plot(range(100), lr_list_1, color='r', label='lr')
plt.legend()
plt.show()</code>ReduceLROnPlateau
<code>scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
optimizer, mode='min', factor=0.1, patience=10, verbose=False,
threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
for epoch in range(10):
train(...)
val_loss = validate(...)
scheduler.step(val_loss)</code>These schedulers enable flexible, automated learning‑rate adjustments based on epoch count or validation metrics, facilitating faster convergence and better model performance.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.