Transfer Learning
What
Take a model pretrained on a large dataset and adapt it to your specific task. The single most important practical technique in deep learning.
Why it works
Early layers learn general features (edges, textures, basic language patterns). Later layers learn task-specific features. You reuse the general features and only retrain the specific ones.
How to do it
1. Feature extraction (freeze pretrained layers)
import torchvision.models as models
model = models.resnet50(weights="IMAGENET1K_V2")
# Freeze all layers
for param in model.parameters():
param.requires_grad = False
# Replace the final layer for your task
model.fc = nn.Linear(2048, num_classes)2. Fine-tuning (unfreeze some/all layers)
# After training the head, optionally unfreeze later layers
for param in model.layer4.parameters():
param.requires_grad = True
# Use a smaller learning rate for pretrained layers
optimizer = torch.optim.Adam([
{"params": model.layer4.parameters(), "lr": 1e-5},
{"params": model.fc.parameters(), "lr": 1e-3},
])When to use what
| Your data size | Similarity to pretrained data | Strategy |
|---|---|---|
| Small | Similar | Feature extraction only |
| Small | Different | Feature extraction, maybe fine-tune last layers |
| Large | Similar | Fine-tune all layers |
| Large | Different | Fine-tune all layers, maybe with lower lr |
For NLP
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(
"bert-base-uncased", num_labels=2
)