Transfer Learning

What

Take a model pretrained on a large dataset and adapt it to your specific task. The single most important practical technique in deep learning.

Why it works

Early layers learn general features (edges, textures, basic language patterns). Later layers learn task-specific features. You reuse the general features and only retrain the specific ones.

How to do it

1. Feature extraction (freeze pretrained layers)

import torchvision.models as models
 
model = models.resnet50(weights="IMAGENET1K_V2")
 
# Freeze all layers
for param in model.parameters():
    param.requires_grad = False
 
# Replace the final layer for your task
model.fc = nn.Linear(2048, num_classes)

2. Fine-tuning (unfreeze some/all layers)

# After training the head, optionally unfreeze later layers
for param in model.layer4.parameters():
    param.requires_grad = True
 
# Use a smaller learning rate for pretrained layers
optimizer = torch.optim.Adam([
    {"params": model.layer4.parameters(), "lr": 1e-5},
    {"params": model.fc.parameters(), "lr": 1e-3},
])

When to use what

Your data sizeSimilarity to pretrained dataStrategy
SmallSimilarFeature extraction only
SmallDifferentFeature extraction, maybe fine-tune last layers
LargeSimilarFine-tune all layers
LargeDifferentFine-tune all layers, maybe with lower lr

For NLP

from transformers import AutoModelForSequenceClassification
 
model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=2
)