Neurons and Activation Functions
The artificial neuron
inputs × weights → sum + bias → activation function → output
output = activation(w₁x₁ + w₂x₂ + ... + wₙxₙ + b)
A single neuron = Dot Product of inputs and weights, plus bias, through a nonlinear function.
Why activation functions?
Without nonlinearity, stacking layers does nothing — it’s still a linear function. Activation functions let neural nets learn complex, nonlinear patterns.
Common activations
| Function | Formula | Range | Use case |
|---|---|---|---|
| ReLU | max(0, x) | [0, ∞) | Default for hidden layers |
| Sigmoid | 1/(1+e⁻ˣ) | (0, 1) | Binary output, gates |
| Tanh | (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ) | (-1, 1) | When centered output needed |
| Softmax | eˣⁱ / Σeˣʲ | (0, 1), sums to 1 | Multi-class output layer |
| GELU | x·Φ(x) | (-0.17, ∞) | Transformers |
| Leaky ReLU | max(0.01x, x) | (-∞, ∞) | Fix dying ReLU problem |
ReLU dominates because
- Simple: max(0, x) — fast to compute
- Sparse: many neurons output 0 → efficient
- No vanishing gradient for positive values
- Problem: “dying ReLU” — neurons stuck at 0 forever → use Leaky ReLU
Layers
- Input layer: your data features
- Hidden layers: learn intermediate representations
- Output layer: final prediction (sigmoid for binary, softmax for multi-class, linear for regression)