Model Monitoring

What

Track model performance in production. Models degrade over time as the real world changes. If you’re not monitoring, you’re guessing.

What to monitor

Data drift: input distribution shifts from training data
Concept drift: the relationship between inputs and outputs changes
Performance metrics: accuracy, latency, error rates
Prediction distribution: are outputs still reasonable?
Infrastructure: memory usage, GPU utilization, request throughput

Drift detection

Statistical tests to catch distribution shifts before they tank your model:

Test	What it does	Best for
KS test (Kolmogorov-Smirnov)	Compares two distributions	Numerical features, univariate
PSI (Population Stability Index)	Measures shift in binned distributions	Categorical/binned features
Chi-squared	Tests independence of categorical distributions	Categorical features
MMD (Maximum Mean Discrepancy)	Kernel-based multivariate test	High-dimensional data

Practical approach: compute reference statistics on your training set. For each batch of production data, run KS/PSI against reference. Alert when p-value drops below threshold or PSI > 0.2.

Deployment strategies

Shadow deployment: new model runs alongside production, receives same traffic, but predictions are not served. Compare outputs to catch issues before they hit users
Canary deployment: route a small % of traffic (e.g., 5%) to the new model. Monitor metrics, then gradually increase if healthy
A/B testing: split traffic between models, measure business metrics (click-through, revenue), decide with statistical significance

Alerting strategies

Not every drift needs a page at 3am. Tier your alerts:

P1 (immediate): model returning errors, latency spikes, prediction distribution collapses
P2 (same day): significant data drift detected, performance below threshold
P3 (weekly review): gradual drift trends, feature importance shifts

When to retrain

Performance drops below a threshold
Significant data drift detected (KS/PSI alerts)
On a regular schedule (weekly, monthly)
When new labeled data becomes available
After a known external event (policy change, seasonality)

Monitoring stack

Tools to consider: Evidently AI (open-source drift detection), Prometheus + Grafana (infra metrics), custom dashboards for prediction distributions. The simplest version: log predictions to a database, run daily drift checks as a cron job.

AI/ML Notes

Explorer

Model Monitoring

Model Monitoring

What

What to monitor

Drift detection

Deployment strategies

Alerting strategies

When to retrain

Monitoring stack

Links

Graph View

Table of Contents

Backlinks