Neurons and Activation Functions

The artificial neuron

inputs × weights → sum + bias → activation function → output

output = activation(w₁x₁ + w₂x₂ + ... + wₙxₙ + b)

A single neuron = Dot Product of inputs and weights, plus bias, through a nonlinear function.

Why activation functions?

Without nonlinearity, stacking layers does nothing — it’s still a linear function. Activation functions let neural nets learn complex, nonlinear patterns.

Common activations

FunctionFormulaRangeUse case
ReLUmax(0, x)[0, ∞)Default for hidden layers
Sigmoid1/(1+e⁻ˣ)(0, 1)Binary output, gates
Tanh(eˣ-e⁻ˣ)/(eˣ+e⁻ˣ)(-1, 1)When centered output needed
Softmaxeˣⁱ / Σeˣʲ(0, 1), sums to 1Multi-class output layer
GELUx·Φ(x)(-0.17, ∞)Transformers
Leaky ReLUmax(0.01x, x)(-∞, ∞)Fix dying ReLU problem

ReLU dominates because

  • Simple: max(0, x) — fast to compute
  • Sparse: many neurons output 0 → efficient
  • No vanishing gradient for positive values
  • Problem: “dying ReLU” — neurons stuck at 0 forever → use Leaky ReLU

Layers

  • Input layer: your data features
  • Hidden layers: learn intermediate representations
  • Output layer: final prediction (sigmoid for binary, softmax for multi-class, linear for regression)