Neurons and Activation Functions

The artificial neuron

inputs × weights → sum + bias → activation function → output

output = activation(w₁x₁ + w₂x₂ + ... + wₙxₙ + b)

A single neuron = Dot Product of inputs and weights, plus bias, through a nonlinear function.

Why activation functions?

Without nonlinearity, stacking layers does nothing — it’s still a linear function. Activation functions let neural nets learn complex, nonlinear patterns.

Common activations

Function	Formula	Range	Use case
ReLU	max(0, x)	[0, ∞)	Default for hidden layers
Sigmoid	1/(1+e⁻ˣ)	(0, 1)	Binary output, gates
Tanh	(eˣ-e⁻ˣ)/(eˣ+e⁻ˣ)	(-1, 1)	When centered output needed
Softmax	eˣⁱ / Σeˣʲ	(0, 1), sums to 1	Multi-class output layer
GELU	x·Φ(x)	(-0.17, ∞)	Transformers
Leaky ReLU	max(0.01x, x)	(-∞, ∞)	Fix dying ReLU problem

ReLU dominates because

Simple: max(0, x) — fast to compute
Sparse: many neurons output 0 → efficient
No vanishing gradient for positive values
Problem: “dying ReLU” — neurons stuck at 0 forever → use Leaky ReLU

Layers

Input layer: your data features
Hidden layers: learn intermediate representations
Output layer: final prediction (sigmoid for binary, softmax for multi-class, linear for regression)

AI/ML Notes

Explorer

Neurons and Activation Functions

Neurons and Activation Functions

The artificial neuron

Why activation functions?

Common activations

ReLU dominates because

Layers

Links

Graph View

Table of Contents

Backlinks