Modern AI Techniques
State-of-the-art methods driving the current AI wave. These build on everything in the previous sections.
Generative Models
- Diffusion Models — how Stable Diffusion, DALL-E, and Midjourney work
- Generative Adversarial Networks — generator vs discriminator
Scaling & Architecture
- Mixture of Experts — sparse models that scale efficiently (GPT-4, Mixtral)
- State Space Models — Mamba, linear-time alternatives to attention
- Multimodal Models — vision + language in a single model (GPT-4V, LLaVA)
- Scaling Laws — power-law relationships between compute, data, and performance
- Vision Transformers — ViT, DeiT, Swin — transformers replace CNNs for vision
Alignment & Training
- RLHF and Alignment — how LLMs learn to be helpful, harmless, honest
- Constitutional AI — rule-based self-improvement
- Instruction Tuning — teaching models to follow instructions
Inference & Efficiency
- Quantization — shrink models from 16-bit to 4-bit with minimal loss
- Knowledge Distillation — train a small model to mimic a large one
- Speculative Decoding — speed up inference by predicting ahead
- LoRA and PEFT — fine-tune large models with tiny parameter budgets
Agents & Applications
- AI Agents — LLMs with tools, memory, and planning
- Tool Use and Function Calling — models that use APIs and code
- Retrieval Augmented Generation — grounding with external knowledge
Links
- Key Papers — the research behind these techniques
- Deep Learning Roadmap — foundations you need first
- Training Projects — hands-on practice