Linear Regression
What
Predict a number by fitting a line (or hyperplane) through the data.
ŷ = w₁x₁ + w₂x₂ + ... + wₙxₙ + b
Why start here
It’s the simplest model and introduces every core ML concept: loss functions, gradient descent, overfitting, regularization. Understand linear regression deeply and everything else follows.
Training
Minimize MSE: find weights w that minimize mean((y - ŷ)²).
Two approaches:
- Normal equation: w = (XᵀX)⁻¹Xᵀy — closed-form, exact, but expensive for large data
- Gradient descent: iteratively update weights — scales to any size
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
# Coefficients tell you feature importance (if features are scaled)
print(model.coef_)
print(model.intercept_)Assumptions
- Linear relationship between features and target
- Features are not highly correlated (multicollinearity)
- Errors are normally distributed with constant variance
When assumptions are violated: try Polynomial Regression, tree-based models, or neural nets.
Regularized variants
- Ridge (L2): shrinks coefficients, handles multicollinearity
- Lasso (L1): shrinks + eliminates features (automatic feature selection)
Links
- Loss Functions — MSE
- Gradient Descent — how it’s trained
- Regularization — Ridge, Lasso
- Logistic Regression — classification version
- Polynomial Regression