Loss Functions

What

A function that measures how wrong the model’s predictions are. Training = minimizing this function.

Regression losses

Loss	Formula	Properties
MSE (Mean Squared Error)	mean((y - ŷ)²)	Penalizes large errors heavily, sensitive to outliers
MAE (Mean Absolute Error)	mean(\|y - ŷ\|)	Robust to outliers, not differentiable at 0
Huber	MSE for small errors, MAE for large	Best of both worlds

Classification losses

Loss	Formula	When
Binary cross-entropy	-[y·log(ŷ) + (1-y)·log(1-ŷ)]	Binary classification
Categorical cross-entropy	-Σ yᵢ·log(ŷᵢ)	Multi-class classification
Hinge loss	max(0, 1 - y·ŷ)	SVM

How to choose

Regression with outliers → Huber or MAE
Regression standard → MSE
Classification → cross-entropy (almost always)
The loss function encodes your definition of “wrong”

Connection to MLE

MSE = MLE assuming Gaussian noise. Cross-entropy = MLE for classification. See Maximum Likelihood Estimation.

Links

Gradient Descent — the algorithm that minimizes the loss
Cross-Entropy and KL Divergence — the math behind classification loss
Maximum Likelihood Estimation
Evaluation Metrics — loss ≠ the metric you care about