Gradient Boosting

What

Build trees sequentially: each new tree corrects the errors of the previous ensemble. Powerful, often wins competitions.

How it works

  1. Start with a simple prediction (e.g., mean)
  2. Compute residuals (errors)
  3. Train a small tree to predict the residuals
  4. Add that tree’s predictions (scaled by learning rate) to the ensemble
  5. Repeat

Each tree fixes what the previous ones got wrong → powerful additive model.

Libraries

LibraryNotes
XGBoostThe classic, fast, handles missing values
LightGBMFaster than XGBoost, good for large data
CatBoostBest for categorical features, no encoding needed
sklearn GradientBoostingSlower, but consistent API
from xgboost import XGBClassifier
 
model = XGBClassifier(
    n_estimators=200,
    max_depth=6,
    learning_rate=0.1,
    subsample=0.8,
)
model.fit(X_train, y_train)

Key hyperparameters

  • n_estimators: number of trees (more = better, up to a point)
  • learning_rate: how much each tree contributes (lower = needs more trees)
  • max_depth: depth of each tree (3-8 usually)
  • subsample: fraction of data per tree (like bagging, reduces overfitting)

Gradient Boosting vs Random Forest

Random ForestGradient Boosting
TreesIndependent, parallelSequential, corrective
OverfittingHard to overfitCan overfit if not tuned
TuningWorks well with defaultsNeeds careful tuning
SpeedFast to trainSlower (sequential)