Optical Flow

What

Estimate pixel-level motion between consecutive video frames. For each pixel (or selected points), compute the 2D displacement vector (dx, dy) showing where that pixel moved.

Optical flow gives you a motion field — a dense or sparse vector field over the image. This is different from object tracking (Multi-Object Tracking): flow operates at the pixel level, tracking operates at the object level.

Dense vs sparse flow

TypeWhatSpeedUse case
SparseTrack selected feature pointsFastFeature tracking, visual odometry
DenseCompute motion for every pixelSlowAction recognition, full motion analysis

Assumptions

Optical flow relies on the brightness constancy assumption: a pixel’s intensity doesn’t change between frames, it only moves.

I(x, y, t) = I(x + dx, y + dy, t + dt)

This breaks when: lighting changes, reflections, transparent objects, very fast motion, large displacements.

The aperture problem

Through a small window (aperture), you can only detect motion perpendicular to an edge. Motion along the edge is ambiguous. This is a fundamental limitation of local methods.

Sparse flow: Lucas-Kanade (LK)

Assumes motion is constant within a small window around each feature point. Solves a least-squares system per point.

The KLT (Kanade-Lucas-Tomasi) tracker is the practical implementation: detect good features to track (corners), then track them with LK.

import cv2
import numpy as np
 
# Load video
cap = cv2.VideoCapture("video.mp4")
ret, prev_frame = cap.read()
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
 
# Detect good features to track (Shi-Tomasi corners)
feature_params = dict(
    maxCorners=200,
    qualityLevel=0.01,
    minDistance=10,
    blockSize=7,
)
prev_pts = cv2.goodFeaturesToTrack(prev_gray, mask=None, **feature_params)
 
# LK optical flow parameters
lk_params = dict(
    winSize=(21, 21),      # search window size
    maxLevel=3,            # pyramid levels (handle larger motions)
    criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 30, 0.01),
)
 
# Track colors for visualization
colors = np.random.randint(0, 255, (200, 3))
mask = np.zeros_like(prev_frame)  # drawing layer
 
while True:
    ret, frame = cap.read()
    if not ret:
        break
 
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
 
    # Compute sparse optical flow
    next_pts, status, error = cv2.calcOpticalFlowPyrLK(
        prev_gray, gray, prev_pts, None, **lk_params
    )
 
    # Select good points (status == 1)
    if next_pts is not None:
        good_new = next_pts[status.flatten() == 1]
        good_old = prev_pts[status.flatten() == 1]
 
        # Draw tracks
        for i, (new, old) in enumerate(zip(good_new, good_old)):
            a, b = new.ravel().astype(int)
            c, d = old.ravel().astype(int)
            mask = cv2.line(mask, (a, b), (c, d), colors[i % 200].tolist(), 2)
            frame = cv2.circle(frame, (a, b), 3, colors[i % 200].tolist(), -1)
 
        output = cv2.add(frame, mask)
        cv2.imshow("Sparse Flow", output)
 
        prev_pts = good_new.reshape(-1, 1, 2)
 
    prev_gray = gray.copy()
 
    if cv2.waitKey(30) & 0xFF == ord("q"):
        break
 
cap.release()
cv2.destroyAllWindows()

Dense flow: Farneback

Compute motion vector for every pixel. Slower but gives complete motion information.

import cv2
import numpy as np
 
def compute_dense_flow(prev_gray, curr_gray):
    """Compute dense optical flow using Farneback method.
    Returns: flow array of shape (H, W, 2) -- dx, dy per pixel.
    """
    flow = cv2.calcOpticalFlowFarneback(
        prev_gray, curr_gray,
        flow=None,
        pyr_scale=0.5,   # pyramid scale
        levels=3,         # pyramid levels
        winsize=15,        # averaging window
        iterations=3,
        poly_n=5,          # polynomial expansion neighborhood
        poly_sigma=1.2,
        flags=0,
    )
    return flow
 
def flow_to_color(flow):
    """Visualize flow as HSV color map.
    Hue = direction, Saturation = 255, Value = magnitude.
    """
    h, w = flow.shape[:2]
    magnitude, angle = cv2.cartToPolar(flow[:, :, 0], flow[:, :, 1])
 
    hsv = np.zeros((h, w, 3), dtype=np.uint8)
    hsv[:, :, 0] = angle * 180 / np.pi / 2   # hue: direction
    hsv[:, :, 1] = 255                         # saturation: full
    hsv[:, :, 2] = cv2.normalize(              # value: magnitude
        magnitude, None, 0, 255, cv2.NORM_MINMAX
    )
    return cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)
 
# Process video
cap = cv2.VideoCapture("video.mp4")
ret, prev_frame = cap.read()
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
 
while True:
    ret, frame = cap.read()
    if not ret:
        break
 
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    flow = compute_dense_flow(prev_gray, gray)
    color_flow = flow_to_color(flow)
 
    cv2.imshow("Dense Flow", color_flow)
    prev_gray = gray.copy()
 
    if cv2.waitKey(30) & 0xFF == ord("q"):
        break
 
cap.release()
cv2.destroyAllWindows()

Reading the color map

In HSV flow visualization:

  • Color (hue) = direction of motion (red = right, cyan = left, green = down, etc.)
  • Brightness (value) = speed of motion (brighter = faster, black = no motion)

Deep learning optical flow

Classical methods struggle with large displacements and textureless regions. Deep learning methods dominate modern benchmarks.

RAFT (Recurrent All-Pairs Field Transforms, 2020)

State of the art. Key ideas:

  1. Extract features from both images with shared CNN
  2. Build 4D correlation volume: all pairs of feature points
  3. Iteratively update flow estimate using GRU (recurrent refinement)
import torch
from torchvision.models.optical_flow import raft_large, Raft_Large_Weights
 
# Load pretrained RAFT
weights = Raft_Large_Weights.DEFAULT
transforms = weights.transforms()
model = raft_large(weights=weights).eval()
 
# Prepare two frames (as tensors)
# img1, img2: (3, H, W) float tensors in [0, 1]
img1_batch, img2_batch = transforms(img1, img2)
img1_batch = img1_batch.unsqueeze(0)  # add batch dim
img2_batch = img2_batch.unsqueeze(0)
 
with torch.no_grad():
    # Returns list of flow predictions (iterative refinement)
    flow_predictions = model(img1_batch, img2_batch)
    flow = flow_predictions[-1]  # final prediction, shape (1, 2, H, W)
    # flow[0, 0] = horizontal displacement, flow[0, 1] = vertical displacement

FlowFormer (2022)

Transformer-based, achieves even better accuracy by using attention to handle long-range correspondences.

Motion detection from flow

One of the most practical applications: detect what’s moving in the scene.

import cv2
import numpy as np
 
def detect_motion(flow, threshold=2.0, min_area=500):
    """Detect moving regions from optical flow.
    Returns: list of bounding boxes [x, y, w, h] for moving objects.
    """
    magnitude = np.sqrt(flow[:, :, 0]**2 + flow[:, :, 1]**2)
 
    # Threshold: pixels moving faster than threshold
    motion_mask = (magnitude > threshold).astype(np.uint8) * 255
 
    # Clean up: morphological operations
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (7, 7))
    motion_mask = cv2.morphologyEx(motion_mask, cv2.MORPH_CLOSE, kernel)
    motion_mask = cv2.morphologyEx(motion_mask, cv2.MORPH_OPEN, kernel)
 
    # Find contours
    contours, _ = cv2.findContours(
        motion_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
    )
 
    boxes = []
    for cnt in contours:
        area = cv2.contourArea(cnt)
        if area > min_area:
            boxes.append(cv2.boundingRect(cnt))
 
    return boxes, motion_mask

Applications

  • Video stabilization: estimate camera motion from flow, then compensate
  • Motion detection: find moving objects in surveillance video (see above)
  • Action recognition: flow as input to temporal models (two-stream networks, see Video Understanding)
  • Drone ego-motion estimation: compute how the drone is moving from visual flow
  • Visual odometry: estimate camera trajectory from flow between frames (see Tutorial - Visual SLAM Concepts)
  • Moving target detection from drone: separate target motion from camera motion using flow compensation

Flow compensation for moving cameras

When the camera moves (e.g., mounted on a drone), the entire flow field contains camera motion. To isolate object motion:

  1. Estimate dominant motion (affine or homography from feature correspondences)
  2. Warp previous frame to align with current frame
  3. Compute flow on the compensated pair
  4. Remaining flow = independent object motion

Self-test questions

  1. What is the brightness constancy assumption, and when does it fail?
  2. Explain the aperture problem in one sentence.
  3. What is the difference between sparse and dense optical flow?
  4. Why does RAFT use iterative refinement rather than predicting flow in one shot?
  5. How would you separate object motion from camera motion in drone footage?

Exercises

  1. Sparse flow: Implement KLT tracking on a video with cv2.calcOpticalFlowPyrLK. Visualize feature point trajectories. Count how many features survive 100 frames.
  2. Dense flow: Compute Farneback flow on a video, visualize as color map. Identify which directions of motion correspond to which colors.
  3. Motion detector: Build a simple motion detector using the flow magnitude thresholding approach above. Test on a surveillance-style video with a static camera. Draw bounding boxes around moving objects.