ImageNet Classification with Deep CNNs (AlexNet)

Krizhevsky, Sutskever, Hinton (2012)

Read paper

Why It Matters

Won ILSVRC-2012 by massive margin. Proved deep CNNs on GPUs could outperform hand-engineered features. Triggered the deep learning revolution.

Key Ideas

  1. Show that deep convolutional networks trained on large labeled datasets with GPUs can drastically outperform hand-engineered vision systems.
  2. Combine ReLUs, dropout, data augmentation, and GPU training into a recipe that made large-scale deep vision practical.
  3. Demonstrate that scale in model size, data, and compute can unlock representations traditional pipelines could not learn.
  4. Mark the turning point where deep learning became the default path in computer vision.

Notes

  • AlexNet was both an algorithmic and systems breakthrough.
  • Its historical importance is less about the exact architecture and more about proving the deep-learning recipe at ImageNet scale.