Ensemble Methods
What
Combine multiple models to get better predictions than any single model. The core idea: individual models make different errors, and combining them cancels out those errors.
Bagging (Bootstrap Aggregating)
Train multiple models on random subsets of the data (with replacement), then average their predictions (regression) or vote (classification). Each model sees a different slice, so they make different mistakes.
Random Forests are the classic example: bag of decision trees, each also using random feature subsets.
Boosting
Train models sequentially. Each new model focuses on the mistakes of the previous ones. The final prediction is a weighted sum of all models.
- AdaBoost: increase weight of misclassified samples, so next model focuses on hard cases
- Gradient Boosting: each new model fits the residual errors (gradient of the loss)
- XGBoost / LightGBM: optimized gradient boosting with regularization and speed tricks
Stacking
Train several different models (e.g., SVM, Random Forest, KNN), then use their predictions as features for a “meta-model” (often logistic regression). The meta-model learns which base models to trust for which inputs.
Comparison
| Aspect | Bagging | Boosting | Stacking |
|---|---|---|---|
| Training | Parallel | Sequential | Two stages |
| Reduces | Variance | Bias | Both |
| Overfitting risk | Low | Higher (can overfit noise) | Medium |
| Speed | Fast (parallelizable) | Slower (sequential) | Depends on base models |
| Example | Random Forest | XGBoost | Blending diverse models |
Why ensembles work
- Variance reduction (bagging): averaging noisy models smooths out random errors
- Bias reduction (boosting): iteratively correcting errors lets simple models capture complex patterns
- Diversity matters: ensembles of identical models don’t help. You need models that disagree on different inputs
Code example
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
# soft voting = average predicted probabilities
ensemble = VotingClassifier(
estimators=[
("lr", LogisticRegression()),
("svc", SVC(probability=True)),
("dt", DecisionTreeClassifier()),
],
voting="soft",
)
ensemble.fit(X_train, y_train)
print(ensemble.score(X_test, y_test))Links
- Random Forests — bagging with decision trees
- Gradient Boosting — the most popular boosting method
- Bias-Variance Tradeoff — why ensembles improve generalization
- Machine Learning Roadmap