Backpropagation

What

The algorithm that computes gradients in a neural network. It’s just the Chain Rule applied systematically from output to input.

How it works

Forward pass

Input flows through the network → compute prediction → compute loss.

Backward pass

  1. Compute ∂loss/∂output (how loss changes with output)
  2. Propagate backwards through each layer using the chain rule
  3. At each layer: ∂loss/∂weights = ∂loss/∂output × ∂output/∂weights
  4. Now every weight has a gradient → Gradient Descent updates them

In PyTorch

# Forward pass
predictions = model(X)
loss = loss_fn(predictions, y)
 
# Backward pass — computes all gradients automatically
loss.backward()
 
# Every parameter now has .grad
for name, param in model.named_parameters():
    print(f"{name}: grad shape = {param.grad.shape}")

PyTorch’s autograd builds a computation graph during the forward pass and walks it backwards. You never implement backprop manually.

Key insight

Backprop is efficient because it reuses intermediate results. Computing all gradients takes only ~2-3x the cost of the forward pass, regardless of network size.