Supervised vs Unsupervised Learning

Supervised Learning

You have inputs (features) AND correct outputs (labels). The model learns to map input → output.

TaskInputOutputExamples
ClassificationFeaturesCategorySpam detection, image recognition
RegressionFeaturesNumberPrice prediction, temperature forecasting

Unsupervised Learning

You have inputs but NO labels. The model finds structure on its own.

TaskWhat it doesExamples
ClusteringGroup similar dataCustomer segmentation, topic discovery
Dimensionality reductionCompress featuresPCA, t-SNE for visualization
Anomaly detectionFind outliersFraud detection, defect detection

Semi-supervised and Self-supervised

  • Semi-supervised: few labels + lots of unlabeled data (practical reality)
  • Self-supervised: create labels from data itself (e.g., mask a word, predict it → BERT)
  • Reinforcement learning: learn from rewards/penalties in an environment → Reinforcement Learning Roadmap

Which to use?

  • Have labels? → supervised
  • No labels, want groups? → clustering
  • No labels, want to reduce features? → dimensionality reduction
  • Want to generate data? → generative models (unsupervised/self-supervised)