Scikit-Learn Overview

What

The standard library for classical ML in Python. Consistent API for preprocessing, models, evaluation.

The API pattern

Every model follows the same interface:

from sklearn.some_module import SomeModel
 
model = SomeModel(hyperparameters)
model.fit(X_train, y_train)         # train
predictions = model.predict(X_test)  # predict
score = model.score(X_test, y_test)  # evaluate

Key modules

Module	What	Examples
`preprocessing`	Scale, encode, transform features	StandardScaler, LabelEncoder, OneHotEncoder
`model_selection`	Split data, tune hyperparameters	train_test_split, cross_val_score, GridSearchCV
`linear_model`	Linear/logistic regression	LinearRegression, LogisticRegression, Ridge, Lasso
`tree`	Decision trees	DecisionTreeClassifier, DecisionTreeRegressor
`ensemble`	Combine models	RandomForestClassifier, GradientBoostingClassifier
`svm`	Support vector machines	SVC, SVR
`cluster`	Unsupervised clustering	KMeans, DBSCAN
`decomposition`	Dimensionality reduction	PCA, NMF
`metrics`	Evaluate performance	accuracy_score, f1_score, confusion_matrix
`pipeline`	Chain preprocessing + model	Pipeline, make_pipeline

Typical workflow

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
 
# split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
 
# preprocess
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)   # fit + transform on train
X_test = scaler.transform(X_test)          # only transform on test!
 
# train
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
 
# evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

AI/ML Notes

Explorer

Scikit-Learn Overview

Scikit-Learn Overview

What

The API pattern

Key modules

Typical workflow

Links

Graph View

Table of Contents

Backlinks