Transfer Learning

What

Take a model pretrained on a large dataset and adapt it to your specific task. The single most important practical technique in deep learning.

Why it works

Early layers learn general features (edges, textures, basic language patterns). Later layers learn task-specific features. You reuse the general features and only retrain the specific ones.

How to do it

1. Feature extraction (freeze pretrained layers)

import torchvision.models as models
 
model = models.resnet50(weights="IMAGENET1K_V2")
 
# Freeze all layers
for param in model.parameters():
    param.requires_grad = False
 
# Replace the final layer for your task
model.fc = nn.Linear(2048, num_classes)

2. Fine-tuning (unfreeze some/all layers)

# After training the head, optionally unfreeze later layers
for param in model.layer4.parameters():
    param.requires_grad = True
 
# Use a smaller learning rate for pretrained layers
optimizer = torch.optim.Adam([
    {"params": model.layer4.parameters(), "lr": 1e-5},
    {"params": model.fc.parameters(), "lr": 1e-3},
])

When to use what

Your data size	Similarity to pretrained data	Strategy
Small	Similar	Feature extraction only
Small	Different	Feature extraction, maybe fine-tune last layers
Large	Similar	Fine-tune all layers
Large	Different	Fine-tune all layers, maybe with lower lr

For NLP

from transformers import AutoModelForSequenceClassification
 
model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=2
)

AI/ML Notes

Explorer

Transfer Learning

Transfer Learning

What

Why it works

How to do it

1. Feature extraction (freeze pretrained layers)

2. Fine-tuning (unfreeze some/all layers)

When to use what

For NLP

Links

Graph View

Table of Contents

Backlinks