K-Nearest Neighbors

What

Classify a new point by looking at its k closest training examples and taking a majority vote. No training phase — just memorize the data.

Key ideas

  • k: number of neighbors to consider. Small k = noisy, large k = smooth
  • Distance metric: usually Euclidean. Features MUST be scaled
  • Lazy learning: no training, all computation at prediction time
from sklearn.neighbors import KNeighborsClassifier
 
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
model.predict(X_test)

When to use

  • Small datasets, simple baselines
  • When decision boundaries are irregular
  • Recommendation systems (find similar items)

Limitations

  • Slow prediction on large datasets (must compute distance to all training points)
  • Suffers from curse of dimensionality — distances become meaningless in high dimensions
  • Sensitive to irrelevant features and scale