Matrix Multiplication
What
Multiply two matrices A (m×n) and B (n×p) to get C (m×p). The inner dimensions must match.
Each element C[i,j] = dot product of row i of A with column j of B.
Why it matters
- Linear layers in neural nets:
output = input @ weights + bias - Transformers: attention is matrix multiplication of queries, keys, values
- Batch processing: multiplying a batch of inputs by weights in one operation
Matrix multiplication is the single most common operation in ML.
Key ideas
- Shape rule: (m, n) @ (n, p) → (m, p) — inner dims must match
- Not commutative: A @ B ≠ B @ A in general
- Associative: (A @ B) @ C = A @ (B @ C)
- Identity matrix: I @ A = A @ I = A
In NumPy
import numpy as np
A = np.array([[1, 2], [3, 4]]) # (2, 2)
B = np.array([[5, 6], [7, 8]]) # (2, 2)
C = A @ B # matrix multiply (preferred syntax)
C = np.dot(A, B) # equivalent
C = A.dot(B) # equivalent
# For neural net layer:
X = np.random.randn(32, 784) # batch of 32 images, 784 features
W = np.random.randn(784, 128) # weights: 784 inputs → 128 outputs
out = X @ W # (32, 128) — 32 samples, 128 outputs