Supervised vs Unsupervised Learning
Supervised Learning
You have inputs (features) AND correct outputs (labels). The model learns to map input → output.
| Task | Input | Output | Examples |
|---|---|---|---|
| Classification | Features | Category | Spam detection, image recognition |
| Regression | Features | Number | Price prediction, temperature forecasting |
Unsupervised Learning
You have inputs but NO labels. The model finds structure on its own.
| Task | What it does | Examples |
|---|---|---|
| Clustering | Group similar data | Customer segmentation, topic discovery |
| Dimensionality reduction | Compress features | PCA, t-SNE for visualization |
| Anomaly detection | Find outliers | Fraud detection, defect detection |
Semi-supervised and Self-supervised
- Semi-supervised: few labels + lots of unlabeled data (practical reality)
- Self-supervised: create labels from data itself (e.g., mask a word, predict it → BERT)
- Reinforcement learning: learn from rewards/penalties in an environment → Reinforcement Learning Roadmap
Which to use?
- Have labels? → supervised
- No labels, want groups? → clustering
- No labels, want to reduce features? → dimensionality reduction
- Want to generate data? → generative models (unsupervised/self-supervised)