K-Means Clustering

What

Partition data into k groups by iteratively assigning points to nearest centroid and updating centroids.

Algorithm

Pick k random centroids
Assign each point to nearest centroid
Recompute centroids as mean of assigned points
Repeat 2-3 until convergence

from sklearn.cluster import KMeans
 
model = KMeans(n_clusters=3, n_init=10)
model.fit(X)
labels = model.labels_          # cluster assignment for each point
centers = model.cluster_centers_ # centroid coordinates

Choosing k

Elbow method: plot inertia (within-cluster sum of squares) vs k, pick the “elbow”
Silhouette score: measures how well points fit their cluster vs other clusters

from sklearn.metrics import silhouette_score
 
for k in range(2, 11):
    model = KMeans(n_clusters=k).fit(X)
    print(f"k={k}, silhouette={silhouette_score(X, model.labels_):.3f}")

Limitations

Must specify k upfront
Assumes spherical, equally-sized clusters
Sensitive to initialization (use n_init=10)
Sensitive to scale → use Feature Scaling

Alternatives

DBSCAN: finds arbitrary-shaped clusters, doesn’t need k
Hierarchical clustering: produces a dendrogram, no k needed
Gaussian Mixture Models: soft assignments (probabilities)

AI/ML Notes

Explorer

K-Means Clustering

K-Means Clustering

What

Algorithm

Choosing k

Limitations

Alternatives

Links

Graph View

Table of Contents

Backlinks