Cosine Similarity

What

Cosine similarity measures the angle between two vectors, ignoring their magnitude. Two vectors pointing in the same direction have similarity 1, regardless of how long they are.

cos_sim(a, b) = (a · b) / (||a|| × ||b||)

where a · b is the Dot Product and ||a|| is the L2 norm (length) of a.

Why magnitude doesn’t matter

A document with the word “python” 100 times and one with “python” 10 times point in roughly the same direction in word-count space — they’re about the same topic, just different lengths. Cosine similarity captures this by normalizing out the magnitude.

Mathematically: if you normalize both vectors to unit length first, cosine similarity is just the dot product. That’s why many embedding systems store pre-normalized vectors — similarity search becomes a simple dot product.

Range and interpretation

ValueMeaning
1Identical direction
0Orthogonal (unrelated)
-1Opposite direction

For non-negative vectors (e.g., TF-IDF, bag-of-words), the range is [0, 1].

Where it’s used

  • Embeddings: finding similar sentences/images in vector databases
  • Recommendation systems: user-item similarity in latent space
  • Information retrieval: query-document matching
  • Clustering: as a distance metric (cosine distance = 1 - cosine similarity)

Python example

import numpy as np
 
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
 
# Similar directions, different magnitudes
a = np.array([1, 2, 3])
b = np.array([2, 4, 6])       # same direction, 2x magnitude
print(cosine_similarity(a, b))  # 1.0
 
# Orthogonal
c = np.array([1, 0])
d = np.array([0, 1])
print(cosine_similarity(c, d))  # 0.0
 
# Using sklearn for batches
from sklearn.metrics.pairwise import cosine_similarity as cs
vecs = np.array([[1, 2, 3], [2, 4, 6], [3, 0, -1]])
print(cs(vecs))  # 3x3 similarity matrix