Matrix Multiplication

What

Multiply two matrices A (m×n) and B (n×p) to get C (m×p). The inner dimensions must match.

Each element C[i,j] = dot product of row i of A with column j of B.

Why it matters

  • Linear layers in neural nets: output = input @ weights + bias
  • Transformers: attention is matrix multiplication of queries, keys, values
  • Batch processing: multiplying a batch of inputs by weights in one operation

Matrix multiplication is the single most common operation in ML.

Key ideas

  • Shape rule: (m, n) @ (n, p) → (m, p) — inner dims must match
  • Not commutative: A @ B ≠ B @ A in general
  • Associative: (A @ B) @ C = A @ (B @ C)
  • Identity matrix: I @ A = A @ I = A

In NumPy

import numpy as np
 
A = np.array([[1, 2], [3, 4]])    # (2, 2)
B = np.array([[5, 6], [7, 8]])    # (2, 2)
 
C = A @ B          # matrix multiply (preferred syntax)
C = np.dot(A, B)   # equivalent
C = A.dot(B)       # equivalent
 
# For neural net layer:
X = np.random.randn(32, 784)   # batch of 32 images, 784 features
W = np.random.randn(784, 128)  # weights: 784 inputs → 128 outputs
out = X @ W                     # (32, 128) — 32 samples, 128 outputs