Regularization
What
Adding a penalty for model complexity to prevent overfitting. Forces the model to find simpler patterns.
Methods
L1 (Lasso) — sparsity
Adds |weights| to the loss. Drives some weights to exactly zero → automatic feature selection.
L2 (Ridge) — small weights
Adds weights² to the loss. Shrinks all weights toward zero but doesn’t eliminate any.
Elastic Net — both
Combines L1 and L2. Best of both worlds.
from sklearn.linear_model import Ridge, Lasso, ElasticNet
# alpha controls regularization strength
ridge = Ridge(alpha=1.0)
lasso = Lasso(alpha=0.1) # some coefficients will be exactly 0
elastic = ElasticNet(alpha=0.1, l1_ratio=0.5)Deep learning regularization
- Dropout: randomly zero out neurons during training → forces redundancy
- Weight decay: L2 regularization in the optimizer
- Early stopping: stop training when validation loss stops improving
- Data augmentation: artificially increase training data variety
- Batch normalization: stabilizes training, mild regularization effect
Intuition
A model that fits training data perfectly probably memorized noise. Regularization says: “find patterns, but keep it simple.” The penalty trades a little training accuracy for much better generalization.