Hyperparameter Tuning
What
Hyperparameters are settings you choose before training (learning rate, tree depth, number of estimators). Tuning = finding the best combination.
Methods
Grid Search — try all combinations
from sklearn.model_selection import GridSearchCV
param_grid = {"n_estimators": [50, 100, 200], "max_depth": [3, 5, 10]}
grid = GridSearchCV(model, param_grid, cv=5, scoring="f1")
grid.fit(X_train, y_train)
print(grid.best_params_)Random Search — sample random combinations (often better)
from sklearn.model_selection import RandomizedSearchCV
param_dist = {"n_estimators": range(50, 500), "max_depth": range(2, 20)}
search = RandomizedSearchCV(model, param_dist, n_iter=50, cv=5)
search.fit(X_train, y_train)Bayesian Optimization — smart search (Optuna)
import optuna
def objective(trial):
n = trial.suggest_int("n_estimators", 50, 500)
d = trial.suggest_int("max_depth", 2, 20)
model = RandomForestClassifier(n_estimators=n, max_depth=d)
return cross_val_score(model, X, y, cv=5).mean()
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)Rules of thumb
- Start with random search to find the right neighborhood
- Narrow down with grid search or Bayesian optimization
- Always use cross-validation inside the search (not test set!)
- More data > more tuning