RNN & LSTM
Learn sequence modeling with Recurrent Neural Networks and Long Short-Term Memory networks.
Why RNNs?#
Regular neural networks can't handle sequences - they have no memory. RNNs process data step-by-step, maintaining a hidden state.
Key Insight
RNNs have loops that allow information to persist across time steps.
RNN Architecture#
โโโโโโโ โโโโโโโ โโโโโโโ
xโ โ โ RNN โ โ โ RNN โ โ โ RNN โ โ output
โโโโฌโโโ โโโโฌโโโ โโโโฌโโโ
โ hโ โ hโ โ hโ
โโโโโโโโโโโโดโโโโโโโโโโโ
(hidden state flows forward)
Simple RNN in PyTorch#
import torch.nn as nn
rnn = nn.RNN(
input_size=10, # Features per time step
hidden_size=64, # Size of hidden state
num_layers=2, # Stacked RNN layers
batch_first=True
)
# Input shape: (batch, sequence_length, features)
output, hidden = rnn(input)
The Vanishing Gradient Problem#
RNNs struggle with long sequences - gradients vanish over many time steps.
LSTM: The Solution#
Long Short-Term Memory networks use gates to control information flow:
lstm = nn.LSTM(
input_size=10,
hidden_size=64,
num_layers=2,
batch_first=True
)
output, (hidden, cell) = lstm(input)
| Feature | Gate | Purpose | Formula |
|---|---|---|---|
| Forget | What to discard | f = ฯ(Wยท[h,x]) | |
| Input | What to store | i = ฯ(Wยท[h,x]) | |
| Output | What to output | o = ฯ(Wยท[h,x]) |
GRU: Simplified LSTM#
gru = nn.GRU(
input_size=10,
hidden_size=64,
num_layers=2,
batch_first=True
)
Use Cases#
Text Generation
Predict next character/word in sequence.
Time Series
Stock prices, weather, sensor data.
Machine Translation
Seq2seq models for language pairs.
Speech Recognition
Audio to text transcription.
Key Takeaways#
Remember
Use LSTM/GRU over vanilla RNN. For most modern NLP tasks, Transformers have replaced RNNs, but RNNs are still useful for time series and smaller datasets.
Ready to level up your skills?
Explore more guides and tutorials to deepen your understanding and become a better developer.