Feature Stores

What

A central repository for ML features — the computed values your models actually train on and serve from. Think of it as a data warehouse specifically designed for ML workflows.

Why you need one

The core problem: features computed during training must match features computed during serving. Without a feature store:

  • Training uses a Pandas pipeline, serving uses a different SQL query — subtle bugs
  • Teams recompute the same features independently — wasted work
  • No record of how a feature was computed — debugging nightmares

Feature stores solve training-serving skew, the silent killer of ML in production.

Architecture

ComponentPurposeBacked by
Offline storeBatch features for trainingData warehouse (BigQuery, Snowflake, Parquet)
Online storeLow-latency features for servingRedis, DynamoDB, Cassandra
Feature registryMetadata, lineage, documentationCatalog database
Transformation engineCompute features from raw dataSpark, SQL, Python

The flow: raw data transformations offline store (for training) materialized to online store (for serving). Same feature definition, two storage backends.

What to store

  • Features: the computed values (user_avg_purchase_last_30d, text_embedding_v2)
  • Metadata: data type, owner, description, freshness requirements
  • Lineage: which raw data sources and transforms produced this feature
  • Timestamps: point-in-time correctness to avoid future data leaking into training

Key tools

ToolNotes
FeastOpen-source, lightweight, Python-native. Good starting point
TectonManaged service by Feast creators. Production-grade
HopsworksOpen-source platform with built-in feature store
Vertex AI Feature StoreGCP-native, integrates with Vertex pipelines
SageMaker Feature StoreAWS-native

For learning and small projects, Feast is the right choice. For production at scale, evaluate managed options.

When you don’t need one

If you have one model, one team, and batch-only serving — a feature store is overkill. A well-organized Parquet file and a documented transformation script will do. Feature stores pay off when you have multiple models sharing features, or real-time serving requirements.