Word Embeddings

What

Dense vector representations of words where similar words have similar vectors. Instead of one-hot encoding (sparse, no meaning), each word becomes a vector of 50-300 floats that capture semantic relationships.

Word2Vec

Two architectures, both trained on predicting words from context:

CBOW (Continuous Bag of Words): predict center word from surrounding words. Faster, better for frequent words
Skip-gram: predict surrounding words from center word. Better for rare words, works well with small datasets

Both learn vectors where relationships are encoded as directions in vector space.

Key property: vector arithmetic

king - man + woman ≈ queen
paris - france + japan ≈ tokyo

The model learns these relationships purely from co-occurrence patterns in text — nobody labeled them.

GloVe (Global Vectors)

Instead of sliding a window through text, GloVe builds a global word co-occurrence matrix and factorizes it. Combines the advantages of count-based methods (use global statistics) with prediction-based methods (learn dense vectors).

FastText

Extends Word2Vec by representing words as bags of character n-grams. “playing” = {pla, lay, ayi, yin, ing, play, layi, …}. This means it can generate vectors for words it has never seen before by combining subword pieces.

Pretrained vs training your own

Approach	When	Source
Pretrained (GloVe, FastText)	General text, quick start	6B-840B token corpora
Train your own	Domain-specific vocabulary (medical, legal)	Your corpus
Fine-tune pretrained	Best of both	Pretrained + your corpus

Python example

import gensim.downloader as api
 
# load pretrained GloVe vectors
model = api.load("glove-wiki-gigaword-100")  # 100-dim vectors
 
# find similar words
model.most_similar("python")  # [('java', 0.75), ('perl', 0.69), ...]
 
# vector arithmetic
model.most_similar(positive=["king", "woman"], negative=["man"])
# [('queen', 0.73), ...]

Limitations

Static: one vector per word regardless of context. “bank” (river) and “bank” (financial) share the same vector
Out-of-vocabulary: Word2Vec and GloVe can’t handle words not in training data (FastText partially solves this)
Superseded by contextual embeddings: models like BERT generate different vectors for the same word depending on context

Still worth understanding — contextual embeddings build on the same intuitions.

AI/ML Notes

Explorer

Word Embeddings

Word Embeddings

What

Word2Vec

Key property: vector arithmetic

GloVe (Global Vectors)

FastText

Pretrained vs training your own

Python example

Limitations

Links

Graph View

Table of Contents

Backlinks