Random Forests

What

An ensemble of decision trees that vote on the prediction. Each tree sees a random subset of data and features → diverse trees → robust predictions.

Why they work

Individual trees overfit, but averaging many random trees cancels out the noise
Bagging (Bootstrap Aggregating): each tree trains on a random sample with replacement
Feature randomness: each split considers a random subset of features

In practice

from sklearn.ensemble import RandomForestClassifier
 
model = RandomForestClassifier(
    n_estimators=100,    # number of trees
    max_depth=10,        # limit tree depth
    min_samples_leaf=5,  # prevent tiny leaves
    n_jobs=-1,           # use all CPU cores
)
model.fit(X_train, y_train)
 
# Feature importance — which features matter most
importances = model.feature_importances_

Strengths and weaknesses

Strengths:

Works well out of the box with minimal tuning
Handles missing values, mixed feature types
Gives feature importance for free
Hard to overfit with enough trees

Weaknesses:

Slow to train with many trees
Not great for very high-dimensional sparse data (text)
Can’t extrapolate beyond training data range

AI/ML Notes

Explorer

Random Forests

Random Forests

What

Why they work

In practice

Strengths and weaknesses

Links

Graph View

Table of Contents

Backlinks