Optical Flow
What
Estimate pixel-level motion between consecutive video frames. For each pixel (or selected points), compute the 2D displacement vector (dx, dy) showing where that pixel moved.
Optical flow gives you a motion field — a dense or sparse vector field over the image. This is different from object tracking (Multi-Object Tracking): flow operates at the pixel level, tracking operates at the object level.
Dense vs sparse flow
| Type | What | Speed | Use case |
|---|---|---|---|
| Sparse | Track selected feature points | Fast | Feature tracking, visual odometry |
| Dense | Compute motion for every pixel | Slow | Action recognition, full motion analysis |
Assumptions
Optical flow relies on the brightness constancy assumption: a pixel’s intensity doesn’t change between frames, it only moves.
I(x, y, t) = I(x + dx, y + dy, t + dt)
This breaks when: lighting changes, reflections, transparent objects, very fast motion, large displacements.
The aperture problem
Through a small window (aperture), you can only detect motion perpendicular to an edge. Motion along the edge is ambiguous. This is a fundamental limitation of local methods.
Sparse flow: Lucas-Kanade (LK)
Assumes motion is constant within a small window around each feature point. Solves a least-squares system per point.
The KLT (Kanade-Lucas-Tomasi) tracker is the practical implementation: detect good features to track (corners), then track them with LK.
import cv2
import numpy as np
# Load video
cap = cv2.VideoCapture("video.mp4")
ret, prev_frame = cap.read()
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
# Detect good features to track (Shi-Tomasi corners)
feature_params = dict(
maxCorners=200,
qualityLevel=0.01,
minDistance=10,
blockSize=7,
)
prev_pts = cv2.goodFeaturesToTrack(prev_gray, mask=None, **feature_params)
# LK optical flow parameters
lk_params = dict(
winSize=(21, 21), # search window size
maxLevel=3, # pyramid levels (handle larger motions)
criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 30, 0.01),
)
# Track colors for visualization
colors = np.random.randint(0, 255, (200, 3))
mask = np.zeros_like(prev_frame) # drawing layer
while True:
ret, frame = cap.read()
if not ret:
break
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Compute sparse optical flow
next_pts, status, error = cv2.calcOpticalFlowPyrLK(
prev_gray, gray, prev_pts, None, **lk_params
)
# Select good points (status == 1)
if next_pts is not None:
good_new = next_pts[status.flatten() == 1]
good_old = prev_pts[status.flatten() == 1]
# Draw tracks
for i, (new, old) in enumerate(zip(good_new, good_old)):
a, b = new.ravel().astype(int)
c, d = old.ravel().astype(int)
mask = cv2.line(mask, (a, b), (c, d), colors[i % 200].tolist(), 2)
frame = cv2.circle(frame, (a, b), 3, colors[i % 200].tolist(), -1)
output = cv2.add(frame, mask)
cv2.imshow("Sparse Flow", output)
prev_pts = good_new.reshape(-1, 1, 2)
prev_gray = gray.copy()
if cv2.waitKey(30) & 0xFF == ord("q"):
break
cap.release()
cv2.destroyAllWindows()Dense flow: Farneback
Compute motion vector for every pixel. Slower but gives complete motion information.
import cv2
import numpy as np
def compute_dense_flow(prev_gray, curr_gray):
"""Compute dense optical flow using Farneback method.
Returns: flow array of shape (H, W, 2) -- dx, dy per pixel.
"""
flow = cv2.calcOpticalFlowFarneback(
prev_gray, curr_gray,
flow=None,
pyr_scale=0.5, # pyramid scale
levels=3, # pyramid levels
winsize=15, # averaging window
iterations=3,
poly_n=5, # polynomial expansion neighborhood
poly_sigma=1.2,
flags=0,
)
return flow
def flow_to_color(flow):
"""Visualize flow as HSV color map.
Hue = direction, Saturation = 255, Value = magnitude.
"""
h, w = flow.shape[:2]
magnitude, angle = cv2.cartToPolar(flow[:, :, 0], flow[:, :, 1])
hsv = np.zeros((h, w, 3), dtype=np.uint8)
hsv[:, :, 0] = angle * 180 / np.pi / 2 # hue: direction
hsv[:, :, 1] = 255 # saturation: full
hsv[:, :, 2] = cv2.normalize( # value: magnitude
magnitude, None, 0, 255, cv2.NORM_MINMAX
)
return cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)
# Process video
cap = cv2.VideoCapture("video.mp4")
ret, prev_frame = cap.read()
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
while True:
ret, frame = cap.read()
if not ret:
break
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
flow = compute_dense_flow(prev_gray, gray)
color_flow = flow_to_color(flow)
cv2.imshow("Dense Flow", color_flow)
prev_gray = gray.copy()
if cv2.waitKey(30) & 0xFF == ord("q"):
break
cap.release()
cv2.destroyAllWindows()Reading the color map
In HSV flow visualization:
- Color (hue) = direction of motion (red = right, cyan = left, green = down, etc.)
- Brightness (value) = speed of motion (brighter = faster, black = no motion)
Deep learning optical flow
Classical methods struggle with large displacements and textureless regions. Deep learning methods dominate modern benchmarks.
RAFT (Recurrent All-Pairs Field Transforms, 2020)
State of the art. Key ideas:
- Extract features from both images with shared CNN
- Build 4D correlation volume: all pairs of feature points
- Iteratively update flow estimate using GRU (recurrent refinement)
import torch
from torchvision.models.optical_flow import raft_large, Raft_Large_Weights
# Load pretrained RAFT
weights = Raft_Large_Weights.DEFAULT
transforms = weights.transforms()
model = raft_large(weights=weights).eval()
# Prepare two frames (as tensors)
# img1, img2: (3, H, W) float tensors in [0, 1]
img1_batch, img2_batch = transforms(img1, img2)
img1_batch = img1_batch.unsqueeze(0) # add batch dim
img2_batch = img2_batch.unsqueeze(0)
with torch.no_grad():
# Returns list of flow predictions (iterative refinement)
flow_predictions = model(img1_batch, img2_batch)
flow = flow_predictions[-1] # final prediction, shape (1, 2, H, W)
# flow[0, 0] = horizontal displacement, flow[0, 1] = vertical displacementFlowFormer (2022)
Transformer-based, achieves even better accuracy by using attention to handle long-range correspondences.
Motion detection from flow
One of the most practical applications: detect what’s moving in the scene.
import cv2
import numpy as np
def detect_motion(flow, threshold=2.0, min_area=500):
"""Detect moving regions from optical flow.
Returns: list of bounding boxes [x, y, w, h] for moving objects.
"""
magnitude = np.sqrt(flow[:, :, 0]**2 + flow[:, :, 1]**2)
# Threshold: pixels moving faster than threshold
motion_mask = (magnitude > threshold).astype(np.uint8) * 255
# Clean up: morphological operations
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (7, 7))
motion_mask = cv2.morphologyEx(motion_mask, cv2.MORPH_CLOSE, kernel)
motion_mask = cv2.morphologyEx(motion_mask, cv2.MORPH_OPEN, kernel)
# Find contours
contours, _ = cv2.findContours(
motion_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
)
boxes = []
for cnt in contours:
area = cv2.contourArea(cnt)
if area > min_area:
boxes.append(cv2.boundingRect(cnt))
return boxes, motion_maskApplications
- Video stabilization: estimate camera motion from flow, then compensate
- Motion detection: find moving objects in surveillance video (see above)
- Action recognition: flow as input to temporal models (two-stream networks, see Video Understanding)
- Drone ego-motion estimation: compute how the drone is moving from visual flow
- Visual odometry: estimate camera trajectory from flow between frames (see Tutorial - Visual SLAM Concepts)
- Moving target detection from drone: separate target motion from camera motion using flow compensation
Flow compensation for moving cameras
When the camera moves (e.g., mounted on a drone), the entire flow field contains camera motion. To isolate object motion:
- Estimate dominant motion (affine or homography from feature correspondences)
- Warp previous frame to align with current frame
- Compute flow on the compensated pair
- Remaining flow = independent object motion
Self-test questions
- What is the brightness constancy assumption, and when does it fail?
- Explain the aperture problem in one sentence.
- What is the difference between sparse and dense optical flow?
- Why does RAFT use iterative refinement rather than predicting flow in one shot?
- How would you separate object motion from camera motion in drone footage?
Exercises
- Sparse flow: Implement KLT tracking on a video with cv2.calcOpticalFlowPyrLK. Visualize feature point trajectories. Count how many features survive 100 frames.
- Dense flow: Compute Farneback flow on a video, visualize as color map. Identify which directions of motion correspond to which colors.
- Motion detector: Build a simple motion detector using the flow magnitude thresholding approach above. Test on a surveillance-style video with a static camera. Draw bounding boxes around moving objects.
Links
- Multi-Object Tracking — flow provides motion cues for tracking
- Video Understanding — flow is a key input for temporal models
- 3D Vision and Depth — flow relates to depth through motion parallax
- Tutorial - Visual SLAM Concepts — visual odometry uses flow