RNN & LSTM

Why RNNs?#

Regular neural networks can't handle sequences - they have no memory. RNNs process data step-by-step, maintaining a hidden state.

Key Insight

RNNs have loops that allow information to persist across time steps.

RNN Architecture#

     ┌─────┐    ┌─────┐    ┌─────┐
x₁ → │ RNN │ → │ RNN │ → │ RNN │ → output
     └──┬──┘    └──┬──┘    └──┬──┘
        │  h₁      │  h₂      │  h₃
        └──────────┴──────────┘
        (hidden state flows forward)

Simple RNN in PyTorch#

python

import torch.nn as nn

rnn = nn.RNN(
    input_size=10,    # Features per time step
    hidden_size=64,   # Size of hidden state
    num_layers=2,     # Stacked RNN layers
    batch_first=True
)

# Input shape: (batch, sequence_length, features)
output, hidden = rnn(input)

The Vanishing Gradient Problem#

RNNs struggle with long sequences - gradients vanish over many time steps.

LSTM: The Solution#

Long Short-Term Memory networks use gates to control information flow:

python

lstm = nn.LSTM(
    input_size=10,
    hidden_size=64,
    num_layers=2,
    batch_first=True
)

output, (hidden, cell) = lstm(input)

Feature	Gate	Purpose
Forget	What to discard	f = σ(W·[h,x])
Input	What to store	i = σ(W·[h,x])
Output	What to output	o = σ(W·[h,x])

GRU: Simplified LSTM#

python

gru = nn.GRU(
    input_size=10,
    hidden_size=64,
    num_layers=2,
    batch_first=True
)

Use Cases#

✍️

Text Generation

Predict next character/word in sequence.

📈

Time Series

Stock prices, weather, sensor data.

🌍

Machine Translation

Seq2seq models for language pairs.

🎤

Speech Recognition

Audio to text transcription.

Key Takeaways#

Remember

Use LSTM/GRU over vanilla RNN. For most modern NLP tasks, Transformers have replaced RNNs, but RNNs are still useful for time series and smaller datasets.

RNN Architecture#

┌─────┐ ┌─────┐ ┌─────┐ x₁ → │ RNN │ → │ RNN │ → │ RNN │ → output └──┬──┘ └──┬──┘ └──┬──┘ │ h₁ │ h₂ │ h₃ └──────────┴──────────┘ (hidden state flows forward)

import torch.nn as nn rnn = nn.RNN( input_size=10, # Features per time step hidden_size=64, # Size of hidden state num_layers=2, # Stacked RNN layers batch_first=True ) # Input shape: (batch, sequence_length, features) output, hidden = rnn(input)

LSTM: The Solution#

Long Short-Term Memory networks use gates to control information flow:

python

lstm = nn.LSTM(
    input_size=10,
    hidden_size=64,
    num_layers=2,
    batch_first=True
)

output, (hidden, cell) = lstm(input)

Feature	Gate	Purpose
Forget	What to discard	f = σ(W·[h,x])
Input	What to store	i = σ(W·[h,x])
Output	What to output	o = σ(W·[h,x])

Why RNNs?#

RNN Architecture#

Simple RNN in PyTorch#

The Vanishing Gradient Problem#

LSTM: The Solution#

GRU: Simplified LSTM#

Use Cases#

Text Generation

Time Series

Machine Translation

Speech Recognition

Key Takeaways#

Ready to level up your skills?

RNN & LSTM

Why RNNs?#

RNN Architecture#

Simple RNN in PyTorch#

The Vanishing Gradient Problem#

LSTM: The Solution#

GRU: Simplified LSTM#

Use Cases#

Text Generation

Time Series

Machine Translation

Speech Recognition

Key Takeaways#

Ready to level up your skills?