Cross-Validation

What

Instead of a single train/test split, split data k times and average the results. More reliable estimate of model performance.

K-Fold Cross-Validation

Split data into k folds. Train on k-1 folds, evaluate on the remaining one. Repeat k times.

from sklearn.model_selection import cross_val_score
 
scores = cross_val_score(model, X, y, cv=5, scoring="accuracy")
print(f"Mean: {scores.mean():.3f} ± {scores.std():.3f}")

Variants

Method	When to use
K-Fold (k=5 or 10)	Default, general purpose
Stratified K-Fold	Classification with imbalanced classes
Leave-One-Out	Very small datasets
Time Series Split	Temporal data (train on past, test on future)
Group K-Fold	When samples are grouped (e.g., same patient)

What it gives you

Mean score: expected performance on unseen data
Std deviation: how stable the model is across different data splits
High std = model performance depends heavily on which data it sees

AI/ML Notes

Explorer

Cross-Validation

Cross-Validation

What

K-Fold Cross-Validation

Variants

What it gives you

Links

Graph View

Table of Contents

Backlinks