Convolutional Neural Networks

What

Neural nets designed for grid-structured data (images). Use small learnable filters that slide across the input to detect patterns.

Key ideas

Convolution layer

A small filter (e.g., 3×3) slides across the image, computing a dot product at each position → produces a feature map.

  • Early layers detect edges, textures
  • Deeper layers detect shapes, objects

Pooling layer

Downsample feature maps (e.g., 2×2 max pooling → halve spatial dimensions). Reduces computation and adds translation invariance.

Architecture pattern

[Conv → ReLU → Pool] × N → Flatten → [FC → ReLU] × M → Output

In PyTorch

import torch.nn as nn
 
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1),   # 3 channels in, 32 out
            nn.ReLU(),
            nn.MaxPool2d(2),                               # halve spatial dims
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 8 * 8, 128),
            nn.ReLU(),
            nn.Linear(128, 10),
        )
 
    def forward(self, x):
        x = self.features(x)
        return self.classifier(x)

Famous architectures (historical progression)

YearModelInnovation
2012AlexNetDeep CNNs on GPU, ReLU, dropout
2014VGGVery deep, small 3×3 filters
2015ResNetSkip connections → 100+ layers
2017EfficientNetBalanced scaling of depth/width/resolution

In practice: use Transfer Learning

Don’t train CNNs from scratch. Use a pretrained model (ResNet, EfficientNet) and fine-tune.