Gradient Boosting
What
Build trees sequentially: each new tree corrects the errors of the previous ensemble. Powerful, often wins competitions.
How it works
- Start with a simple prediction (e.g., mean)
- Compute residuals (errors)
- Train a small tree to predict the residuals
- Add that tree’s predictions (scaled by learning rate) to the ensemble
- Repeat
Each tree fixes what the previous ones got wrong → powerful additive model.
Libraries
| Library | Notes |
|---|---|
| XGBoost | The classic, fast, handles missing values |
| LightGBM | Faster than XGBoost, good for large data |
| CatBoost | Best for categorical features, no encoding needed |
| sklearn GradientBoosting | Slower, but consistent API |
from xgboost import XGBClassifier
model = XGBClassifier(
n_estimators=200,
max_depth=6,
learning_rate=0.1,
subsample=0.8,
)
model.fit(X_train, y_train)Key hyperparameters
n_estimators: number of trees (more = better, up to a point)learning_rate: how much each tree contributes (lower = needs more trees)max_depth: depth of each tree (3-8 usually)subsample: fraction of data per tree (like bagging, reduces overfitting)
Gradient Boosting vs Random Forest
| Random Forest | Gradient Boosting | |
|---|---|---|
| Trees | Independent, parallel | Sequential, corrective |
| Overfitting | Hard to overfit | Can overfit if not tuned |
| Tuning | Works well with defaults | Needs careful tuning |
| Speed | Fast to train | Slower (sequential) |
Links
- Decision Trees
- Random Forests
- Hyperparameter Tuning
- Gradient Descent — same “gradient” concept applied to tree ensembles