Cosine Similarity

What

Cosine similarity measures the angle between two vectors, ignoring their magnitude. Two vectors pointing in the same direction have similarity 1, regardless of how long they are.

cos_sim(a, b) = (a · b) / (||a|| × ||b||)

where a · b is the Dot Product and ||a|| is the L2 norm (length) of a.

Why magnitude doesn’t matter

A document with the word “python” 100 times and one with “python” 10 times point in roughly the same direction in word-count space — they’re about the same topic, just different lengths. Cosine similarity captures this by normalizing out the magnitude.

Mathematically: if you normalize both vectors to unit length first, cosine similarity is just the dot product. That’s why many embedding systems store pre-normalized vectors — similarity search becomes a simple dot product.

Range and interpretation

Value	Meaning
1	Identical direction
0	Orthogonal (unrelated)
-1	Opposite direction

For non-negative vectors (e.g., TF-IDF, bag-of-words), the range is [0, 1].

Where it’s used

Embeddings: finding similar sentences/images in vector databases
Recommendation systems: user-item similarity in latent space
Information retrieval: query-document matching
Clustering: as a distance metric (cosine distance = 1 - cosine similarity)

Python example

import numpy as np
 
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
 
# Similar directions, different magnitudes
a = np.array([1, 2, 3])
b = np.array([2, 4, 6])       # same direction, 2x magnitude
print(cosine_similarity(a, b))  # 1.0
 
# Orthogonal
c = np.array([1, 0])
d = np.array([0, 1])
print(cosine_similarity(c, d))  # 0.0
 
# Using sklearn for batches
from sklearn.metrics.pairwise import cosine_similarity as cs
vecs = np.array([[1, 2, 3], [2, 4, 6], [3, 0, -1]])
print(cs(vecs))  # 3x3 similarity matrix

AI/ML Notes

Explorer

Cosine Similarity

Cosine Similarity

What

Why magnitude doesn’t matter

Range and interpretation

Where it’s used

Python example

Links

Graph View

Table of Contents

Backlinks