Optimization

The Goal of Optimization#

Find the weights that minimize the loss function. The optimizer decides how to update weights based on gradients.

Gradient Descent Variants#

Feature	Optimizer	Update Rule
SGD	w -= lr * grad	Simple, works well
Momentum	Accumulates velocity	Faster convergence
Adam	Adaptive learning rates	Good default choice

SGD with Momentum#

python

optimizer = torch.optim.SGD(
    model.parameters(),
    lr=0.01,
    momentum=0.9
)

Adam Optimizer#

The go-to optimizer for most tasks:

python

optimizer = torch.optim.Adam(
    model.parameters(),
    lr=0.001,
    betas=(0.9, 0.999)
)

Learning Rate Scheduling#

python

from torch.optim.lr_scheduler import StepLR, ReduceLROnPlateau

# Reduce every N epochs
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)

# Reduce when loss plateaus
scheduler = ReduceLROnPlateau(optimizer, patience=5)

Key Takeaways#

🚀

Start with Adam

lr=0.001 works for most cases. Simple and effective.

📉

Use Scheduling

Reduce learning rate as training progresses.

The Goal of Optimization#

Find the weights that minimize the loss function. The optimizer decides how to update weights based on gradients.

Gradient Descent Variants#

Feature	Optimizer	Update Rule
SGD	w -= lr * grad	Simple, works well
Momentum	Accumulates velocity	Faster convergence
Adam	Adaptive learning rates	Good default choice

SGD with Momentum#

python

optimizer = torch.optim.SGD(
    model.parameters(),
    lr=0.01,
    momentum=0.9
)

Adam Optimizer#

The go-to optimizer for most tasks:

python

optimizer = torch.optim.Adam(
    model.parameters(),
    lr=0.001,
    betas=(0.9, 0.999)
)

Learning Rate Scheduling#

python

from torch.optim.lr_scheduler import StepLR, ReduceLROnPlateau

# Reduce every N epochs
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)

# Reduce when loss plateaus
scheduler = ReduceLROnPlateau(optimizer, patience=5)

Key Takeaways#

🚀

Start with Adam

lr=0.001 works for most cases. Simple and effective.

📉

Use Scheduling

Reduce learning rate as training progresses.

The Goal of Optimization#

Gradient Descent Variants#

SGD with Momentum#

Adam Optimizer#

Learning Rate Scheduling#

Key Takeaways#

Start with Adam

Use Scheduling

Ready to level up your skills?

Optimization

The Goal of Optimization#

Gradient Descent Variants#

SGD with Momentum#

Adam Optimizer#

Learning Rate Scheduling#

Key Takeaways#

Start with Adam

Use Scheduling

Ready to level up your skills?