NumPy Essentials

What

NumPy = Numerical Python. The foundation of the entire Python ML stack. Provides fast n-dimensional arrays and vectorized operations.

Why it matters

  • Pandas, scikit-learn, PyTorch all build on NumPy concepts
  • Vectorized ops are 10-100x faster than Python loops
  • Understanding array shapes is essential for debugging ML code

Key concepts

Creating arrays

import numpy as np
 
np.array([1, 2, 3])              # from list
np.zeros((3, 4))                  # 3×4 matrix of zeros
np.ones((2, 3))                   # 2×3 matrix of ones
np.random.randn(5, 3)             # 5×3 random normal
np.arange(0, 10, 2)               # [0, 2, 4, 6, 8]
np.linspace(0, 1, 5)              # 5 evenly spaced points [0, 0.25, 0.5, 0.75, 1]

Shape operations

a = np.random.randn(3, 4, 5)
 
a.shape          # (3, 4, 5)
a.reshape(3, 20) # reshape to (3, 20) — total elements must match
a.T              # transpose
a.flatten()      # 1D
a[np.newaxis, :] # add dimension: (1, 3, 4, 5)

Indexing and slicing

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
 
a[0]           # first row: [1, 2, 3]
a[:, 1]        # second column: [2, 5, 8]
a[1:, :2]      # rows 1+, first 2 cols: [[4, 5], [7, 8]]
a[a > 5]       # boolean indexing: [6, 7, 8, 9]

Vectorized operations

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
 
a + b          # [5, 7, 9]
a * b          # [4, 10, 18] element-wise
a @ b          # 32 (dot product)
np.sum(a)      # 6
np.mean(a)     # 2.0
np.std(a)      # 0.816...
np.exp(a)      # element-wise exponential

Broadcasting

When shapes don’t match, NumPy stretches the smaller array:

M = np.ones((3, 4))    # shape (3, 4)
v = np.array([1, 2, 3, 4])  # shape (4,)
M + v                   # v broadcasts to (3, 4) — adds to each row

Rules: align shapes right-to-left. Dimensions must be equal or one of them is 1.