Feature Stores

What

A central repository for ML features — the computed values your models actually train on and serve from. Think of it as a data warehouse specifically designed for ML workflows.

Why you need one

The core problem: features computed during training must match features computed during serving. Without a feature store:

Training uses a Pandas pipeline, serving uses a different SQL query — subtle bugs
Teams recompute the same features independently — wasted work
No record of how a feature was computed — debugging nightmares

Feature stores solve training-serving skew, the silent killer of ML in production.

Architecture

Component	Purpose	Backed by
Offline store	Batch features for training	Data warehouse (BigQuery, Snowflake, Parquet)
Online store	Low-latency features for serving	Redis, DynamoDB, Cassandra
Feature registry	Metadata, lineage, documentation	Catalog database
Transformation engine	Compute features from raw data	Spark, SQL, Python

The flow: raw data → transformations → offline store (for training) → materialized to online store (for serving). Same feature definition, two storage backends.

What to store

Features: the computed values (user_avg_purchase_last_30d, text_embedding_v2)
Metadata: data type, owner, description, freshness requirements
Lineage: which raw data sources and transforms produced this feature
Timestamps: point-in-time correctness to avoid future data leaking into training

Key tools

Tool	Notes
Feast	Open-source, lightweight, Python-native. Good starting point
Tecton	Managed service by Feast creators. Production-grade
Hopsworks	Open-source platform with built-in feature store
Vertex AI Feature Store	GCP-native, integrates with Vertex pipelines
SageMaker Feature Store	AWS-native

For learning and small projects, Feast is the right choice. For production at scale, evaluate managed options.

When you don’t need one

If you have one model, one team, and batch-only serving — a feature store is overkill. A well-organized Parquet file and a documented transformation script will do. Feature stores pay off when you have multiple models sharing features, or real-time serving requirements.

AI/ML Notes

Explorer

Feature Stores

Feature Stores

What

Why you need one

Architecture

What to store

Key tools

When you don’t need one

Links

Graph View

Table of Contents

Backlinks