Recurrent Neural Networks

What

Neural nets with a loop — output at each step feeds back as input to the next step. Designed for sequential data (text, time series).

h_t = activation(W_h × h_{t-1} + W_x × x_t + b)

Each hidden state h_t is a function of the current input AND the previous hidden state → memory of past inputs.

Variants

LSTM (Long Short-Term Memory)

Adds gates (forget, input, output) to control what to remember and forget. Solves the vanishing gradient problem for sequences.

GRU (Gated Recurrent Unit)

Simplified LSTM with fewer gates. Similar performance, fewer parameters.

Mostly historical now

Transformers have replaced RNNs for most tasks because:

  • RNNs process sequentially → can’t parallelize → slow
  • Even LSTMs struggle with very long sequences
  • Transformers process all positions in parallel with Attention Mechanism

Still useful for

  • Small sequential problems where transformers are overkill
  • On-device inference with strict memory constraints
  • Understanding the history of NLP/sequence modeling