Loss Functions
What
A function that measures how wrong the model’s predictions are. Training = minimizing this function.
Regression losses
| Loss | Formula | Properties |
|---|---|---|
| MSE (Mean Squared Error) | mean((y - ŷ)²) | Penalizes large errors heavily, sensitive to outliers |
| MAE (Mean Absolute Error) | mean(|y - ŷ|) | Robust to outliers, not differentiable at 0 |
| Huber | MSE for small errors, MAE for large | Best of both worlds |
Classification losses
| Loss | Formula | When |
|---|---|---|
| Binary cross-entropy | -[y·log(ŷ) + (1-y)·log(1-ŷ)] | Binary classification |
| Categorical cross-entropy | -Σ yᵢ·log(ŷᵢ) | Multi-class classification |
| Hinge loss | max(0, 1 - y·ŷ) | SVM |
How to choose
- Regression with outliers → Huber or MAE
- Regression standard → MSE
- Classification → cross-entropy (almost always)
- The loss function encodes your definition of “wrong”
Connection to MLE
MSE = MLE assuming Gaussian noise. Cross-entropy = MLE for classification. See Maximum Likelihood Estimation.
Links
- Gradient Descent — the algorithm that minimizes the loss
- Cross-Entropy and KL Divergence — the math behind classification loss
- Maximum Likelihood Estimation
- Evaluation Metrics — loss ≠ the metric you care about