Transfer Learning
Leverage pretrained models to achieve great results with less data and compute.
What is Transfer Learning?#
Instead of training from scratch, start with a model pretrained on a large dataset and adapt it to your task.
The Power
A model trained on ImageNet (14M images) can be adapted for your 1000-image dataset.
Why It Works#
Early layers learn generic features (edges, textures), later layers learn task-specific features. Generic features transfer well.
Transfer Learning Strategies#
| Feature | Strategy | When to Use | Data Required |
|---|---|---|---|
| Feature Extraction | Freeze pretrained, train head | Little data | |
| Fine-tuning | Unfreeze some layers | Moderate data | |
| Full Fine-tuning | Train all layers | Lots of data |
Image Classification Example#
import torchvision.models as models
import torch.nn as nn
# Load pretrained ResNet
model = models.resnet50(pretrained=True)
# Freeze all layers
for param in model.parameters():
param.requires_grad = False
# Replace final layer
model.fc = nn.Linear(2048, num_classes)
# Only train the new layer
optimizer = torch.optim.Adam(model.fc.parameters(), lr=0.001)
NLP Transfer Learning#
from transformers import AutoModelForSequenceClassification
# Load pretrained BERT for classification
model = AutoModelForSequenceClassification.from_pretrained(
"bert-base-uncased",
num_labels=2
)
# Fine-tune on your dataset
trainer.train()
Best Practices#
Start Frozen
Freeze pretrained layers first, train head only.
Lower Learning Rate
Use smaller LR for pretrained layers (10x-100x less).
Gradual Unfreezing
Unfreeze layers from top to bottom gradually.
Match Preprocessing
Use same normalization as original training.
Key Takeaways#
Remember
Always start with transfer learning. It's faster, requires less data, and usually performs better than training from scratch.
Ready to level up your skills?
Explore more guides and tutorials to deepen your understanding and become a better developer.