Linear Regression

What

Predict a number by fitting a line (or hyperplane) through the data.

ŷ = w₁x₁ + w₂x₂ + ... + wₙxₙ + b

Why start here

It’s the simplest model and introduces every core ML concept: loss functions, gradient descent, overfitting, regularization. Understand linear regression deeply and everything else follows.

Training

Minimize MSE: find weights w that minimize mean((y - ŷ)²).

Two approaches:

  • Normal equation: w = (XᵀX)⁻¹Xᵀy — closed-form, exact, but expensive for large data
  • Gradient descent: iteratively update weights — scales to any size
from sklearn.linear_model import LinearRegression
 
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
 
# Coefficients tell you feature importance (if features are scaled)
print(model.coef_)
print(model.intercept_)

Assumptions

  • Linear relationship between features and target
  • Features are not highly correlated (multicollinearity)
  • Errors are normally distributed with constant variance

When assumptions are violated: try Polynomial Regression, tree-based models, or neural nets.

Regularized variants

  • Ridge (L2): shrinks coefficients, handles multicollinearity
  • Lasso (L1): shrinks + eliminates features (automatic feature selection)