Decision Trees

What

A flowchart-like model that splits data on feature thresholds to make predictions.

Is age > 30?
├── Yes: Is income > 50k?
│   ├── Yes → approve loan
│   └── No  → deny loan
└── No:  Is employed?
    ├── Yes → approve loan
    └── No  → deny loan

Why they matter

Interpretable: you can read and explain the decision logic
No scaling needed: don’t care about feature ranges
Handle mixed types: numeric and categorical features
Foundation for Random Forests and Gradient Boosting

How splits work

At each node, find the feature + threshold that best separates the data:

Classification: maximize information gain (reduce Entropy) or Gini impurity
Regression: minimize MSE of the resulting groups

The overfitting problem

An unrestricted tree will keep splitting until every leaf has one sample → perfect training accuracy, terrible generalization. Solutions:

max_depth: limit tree depth
min_samples_split: require minimum samples to split
min_samples_leaf: require minimum samples in each leaf
Pruning: grow full tree, then cut branches that don’t help

from sklearn.tree import DecisionTreeClassifier
 
model = DecisionTreeClassifier(max_depth=5, min_samples_leaf=10)
model.fit(X_train, y_train)

AI/ML Notes

Explorer

Decision Trees

Decision Trees

What

Why they matter

How splits work

The overfitting problem

Links

Graph View

Table of Contents

Backlinks