Optimization
Master SGD, Adam, and learning rate scheduling to train neural networks effectively.
1 min read
The Goal of Optimization#
Find the weights that minimize the loss function. The optimizer decides how to update weights based on gradients.
Gradient Descent Variants#
| Feature | Optimizer | Update Rule | Pros |
|---|---|---|---|
| SGD | w -= lr * grad | Simple, works well | |
| Momentum | Accumulates velocity | Faster convergence | |
| Adam | Adaptive learning rates | Good default choice |
SGD with Momentum#
python
optimizer = torch.optim.SGD(
model.parameters(),
lr=0.01,
momentum=0.9
)
Adam Optimizer#
The go-to optimizer for most tasks:
python
optimizer = torch.optim.Adam(
model.parameters(),
lr=0.001,
betas=(0.9, 0.999)
)
Learning Rate Scheduling#
python
from torch.optim.lr_scheduler import StepLR, ReduceLROnPlateau
# Reduce every N epochs
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
# Reduce when loss plateaus
scheduler = ReduceLROnPlateau(optimizer, patience=5)
Key Takeaways#
๐
Start with Adam
lr=0.001 works for most cases. Simple and effective.
๐
Use Scheduling
Reduce learning rate as training progresses.
Continue Learning
Ready to level up your skills?
Explore more guides and tutorials to deepen your understanding and become a better developer.