Recurrent Neural Networks
What
Neural nets with a loop — output at each step feeds back as input to the next step. Designed for sequential data (text, time series).
h_t = activation(W_h × h_{t-1} + W_x × x_t + b)
Each hidden state h_t is a function of the current input AND the previous hidden state → memory of past inputs.
Variants
LSTM (Long Short-Term Memory)
Adds gates (forget, input, output) to control what to remember and forget. Solves the vanishing gradient problem for sequences.
GRU (Gated Recurrent Unit)
Simplified LSTM with fewer gates. Similar performance, fewer parameters.
Mostly historical now
Transformers have replaced RNNs for most tasks because:
- RNNs process sequentially → can’t parallelize → slow
- Even LSTMs struggle with very long sequences
- Transformers process all positions in parallel with Attention Mechanism
Still useful for
- Small sequential problems where transformers are overkill
- On-device inference with strict memory constraints
- Understanding the history of NLP/sequence modeling
Links
- Transformers — the successor
- Attention Mechanism
- Vanishing and Exploding Gradients
- NLP Roadmap