Convolutional Neural Networks
What
Neural nets designed for grid-structured data (images). Use small learnable filters that slide across the input to detect patterns.
Key ideas
Convolution layer
A small filter (e.g., 3×3) slides across the image, computing a dot product at each position → produces a feature map.
- Early layers detect edges, textures
- Deeper layers detect shapes, objects
Pooling layer
Downsample feature maps (e.g., 2×2 max pooling → halve spatial dimensions). Reduces computation and adds translation invariance.
Architecture pattern
[Conv → ReLU → Pool] × N → Flatten → [FC → ReLU] × M → Output
In PyTorch
import torch.nn as nn
class SimpleCNN(nn.Module):
def __init__(self):
super().__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1), # 3 channels in, 32 out
nn.ReLU(),
nn.MaxPool2d(2), # halve spatial dims
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2),
)
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(64 * 8 * 8, 128),
nn.ReLU(),
nn.Linear(128, 10),
)
def forward(self, x):
x = self.features(x)
return self.classifier(x)Famous architectures (historical progression)
| Year | Model | Innovation |
|---|---|---|
| 2012 | AlexNet | Deep CNNs on GPU, ReLU, dropout |
| 2014 | VGG | Very deep, small 3×3 filters |
| 2015 | ResNet | Skip connections → 100+ layers |
| 2017 | EfficientNet | Balanced scaling of depth/width/resolution |
In practice: use Transfer Learning
Don’t train CNNs from scratch. Use a pretrained model (ResNet, EfficientNet) and fine-tune.