Polynomial Regression

What

Linear regression with polynomial features. Fit curves instead of lines.

ŷ = w₁x + w₂x² + w₃x³ + b

Still “linear” in the weights — just nonlinear in the features.

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
 
model = make_pipeline(PolynomialFeatures(degree=3), LinearRegression())
model.fit(X_train, y_train)

When to use

  • Residual plots from linear regression show a clear curve — the relationship isn’t linear
  • Low-dimensional data (1-3 features). With many features, polynomial expansion creates too many terms
  • You want an interpretable model — coefficients still have meaning (unlike tree-based models)
  • Quick baseline before trying more complex nonlinear methods

Interaction terms

PolynomialFeatures also creates interaction terms like x₁ * x₂. This captures how features combine, not just individual nonlinearity.

# interaction_only=True skips pure powers (x², x³), keeps only cross terms
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2, interaction_only=True)
# [x1, x2] → [1, x1, x2, x1*x2]

Watch out

  • High degree → overfitting (fits training noise)
  • Features explode: degree 3 with 10 features → 286 features
  • Use Regularization (Ridge/Lasso) with polynomial features

Polynomial regression vs decision trees

AspectPolynomial regressionDecision trees
NonlinearitySmooth curvesStep functions
ExtrapolationWild swings outside training rangeFlat (predicts last seen value)
InterpretabilityCoefficients have meaningVisual tree splits
Feature scalingNeeded (features have different magnitudes)Not needed
Best forSmooth, continuous relationshipsComplex interactions, categorical data

Alternatives

  • Splines: piecewise polynomials joined at knots. Smoother than high-degree polynomials, less prone to oscillation at boundaries
  • Kernel methods: project data into high-dimensional space implicitly (see Support Vector Machines). No explicit feature expansion
  • GAMs (Generalized Additive Models): fit a smooth function per feature, then add them up. Interpretable and flexible