K-Nearest Neighbors
What
Classify a new point by looking at its k closest training examples and taking a majority vote. No training phase — just memorize the data.
Key ideas
- k: number of neighbors to consider. Small k = noisy, large k = smooth
- Distance metric: usually Euclidean. Features MUST be scaled
- Lazy learning: no training, all computation at prediction time
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
model.predict(X_test)When to use
- Small datasets, simple baselines
- When decision boundaries are irregular
- Recommendation systems (find similar items)
Limitations
- Slow prediction on large datasets (must compute distance to all training points)
- Suffers from curse of dimensionality — distances become meaningless in high dimensions
- Sensitive to irrelevant features and scale