Scikit-Learn Overview
What
The standard library for classical ML in Python. Consistent API for preprocessing, models, evaluation.
The API pattern
Every model follows the same interface:
from sklearn.some_module import SomeModel
model = SomeModel(hyperparameters)
model.fit(X_train, y_train) # train
predictions = model.predict(X_test) # predict
score = model.score(X_test, y_test) # evaluateKey modules
| Module | What | Examples |
|---|---|---|
preprocessing | Scale, encode, transform features | StandardScaler, LabelEncoder, OneHotEncoder |
model_selection | Split data, tune hyperparameters | train_test_split, cross_val_score, GridSearchCV |
linear_model | Linear/logistic regression | LinearRegression, LogisticRegression, Ridge, Lasso |
tree | Decision trees | DecisionTreeClassifier, DecisionTreeRegressor |
ensemble | Combine models | RandomForestClassifier, GradientBoostingClassifier |
svm | Support vector machines | SVC, SVR |
cluster | Unsupervised clustering | KMeans, DBSCAN |
decomposition | Dimensionality reduction | PCA, NMF |
metrics | Evaluate performance | accuracy_score, f1_score, confusion_matrix |
pipeline | Chain preprocessing + model | Pipeline, make_pipeline |
Typical workflow
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
# split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# preprocess
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train) # fit + transform on train
X_test = scaler.transform(X_test) # only transform on test!
# train
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
# evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))