PyTorch 学习笔记 (8): 序列模型 (RNN/LSTM)
2025-12-08·12 min read
#PyTorch#Deep Learning#RNN#LSTM
序列数据的特点
- 数据有先后顺序
- 当前时刻的输入可能与之前的信息有关
常见序列数据:
- 文本(单词序列)
- 语音(声波序列)
- 股票价格(时间序列)
- 视频(帧序列)
RNN vs LSTM
- RNN:简单的循环神经网络,容易梯度消失
- LSTM:长短期记忆网络,解决梯度消失问题
RNN 核心思想
text
h_t = tanh(W_h * h_{t-1} + W_x * x_t + b)
y_t = W_y * h_t + c
- 引入"隐藏状态"(hidden state)
- 将上一步的信息传递到当前步骤
- 问题:梯度消失,无法学习长期依赖
LSTM 改进
- 引入"门控机制"(遗忘门、输入门、输出门)
- 可以选择性保留或遗忘信息
- 能够学习长期依赖
创建序列数据
python
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
# 生成正弦波数据
def create_sine_wave(seq_length=1000):
t = np.linspace(0, 4*np.pi, seq_length)
data = np.sin(t) + 0.1 * np.random.randn(seq_length)
return torch.tensor(data, dtype=torch.float32)
data = create_sine_wave(1000)
# 准备训练数据
def create_sequences(data, seq_len, pred_len):
"""将序列数据转换为训练样本"""
X, y = [], []
for i in range(len(data) - seq_len - pred_len):
X.append(data[i:i+seq_len])
y.append(data[i+seq_len:i+seq_len+pred_len])
return torch.stack(X), torch.stack(y)
# 使用前50个时间步预测后10个时间步
seq_input_len = 50
seq_output_len = 10
X, y = create_sequences(data, seq_input_len, seq_output_len)
# 划分训练集和测试集
train_size = int(0.8 * len(X))
X_train, y_train = X[:train_size], y[:train_size]
X_test, y_test = X[train_size:], y[train_size:]
定义 RNN 模型
python
class SimpleRNN(nn.Module):
"""简单的 RNN 模型"""
def __init__(self, input_size=1, hidden_size=32, output_size=10):
super().__init__()
self.hidden_size = hidden_size
self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
# x shape: (batch, seq_len, input_size)
out, h_n = self.rnn(x)
# 使用最后一个时间步的隐藏状态
out = self.fc(out[:, -1, :])
return out
定义 LSTM 模型
python
class LSTMModel(nn.Module):
"""LSTM 模型"""
def __init__(self, input_size=1, hidden_size=32, output_size=10):
super().__init__()
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
out, (h_n, c_n) = self.lstm(x)
out = self.fc(out[:, -1, :])
return out
双向 LSTM
python
class BidirectionalLSTM(nn.Module):
"""双向 LSTM"""
def __init__(self, input_size=1, hidden_size=32, output_size=10):
super().__init__()
self.lstm = nn.LSTM(input_size, hidden_size,
batch_first=True,
bidirectional=True) # 双向
# 双向输出维度是 2 * hidden_size
self.fc = nn.Linear(hidden_size * 2, output_size)
def forward(self, x):
out, (h_n, c_n) = self.lstm(x)
out = self.fc(out[:, -1, :])
return out
训练模型
python
def train_model(model, model_name, X_train, y_train, X_test, y_test, epochs=100):
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
train_losses = []
test_losses = []
for epoch in range(epochs):
model.train()
optimizer.zero_grad()
y_pred = model(X_train.unsqueeze(-1)) # 添加 feature 维度
loss = criterion(y_pred, y_train)
loss.backward()
optimizer.step()
train_losses.append(loss.item())
# 评估
model.eval()
with torch.no_grad():
test_loss = criterion(model(X_test.unsqueeze(-1)), y_test).item()
test_losses.append(test_loss)
return train_losses, test_losses
# 训练所有模型
models = {
'RNN': SimpleRNN(),
'LSTM': LSTMModel(),
'BiLSTM': BidirectionalLSTM()
}
for name, model in models.items():
train_losses, test_losses = train_model(
model, name, X_train, y_train, X_test, y_test, epochs=100
)
RNN vs LSTM 对比
| 特性 | RNN | LSTM |
|---|---|---|
| 结构复杂度 | 简单 | 复杂(有门控) |
| 参数量 | 少 | 多 |
| 训练速度 | 快 | 慢 |
| 长期依赖 | 差(梯度消失) | 好 |
| 适用场景 | 短序列 | 长序列 |
LSTM 的门控机制
-
遗忘门 (Forget Gate):决定丢弃什么信息
textf_t = σ(W_f · [h_{t-1}, x_t] + b_f) -
输入门 (Input Gate):决定存储什么信息
texti_t = σ(W_i · [h_{t-1}, x_t] + b_i) -
输出门 (Output Gate):决定输出什么信息
texto_t = σ(W_o · [h_{t-1}, x_t] + b_o)
使用建议
- 短序列(小于 20 步):RNN 足够
- 中长序列(20-100 步):LSTM
- 超长序列(大于 100 步):LSTM + 注意力机制
- 需要上下文:使用双向 LSTM
常用参数设置
| 参数 | 推荐值 |
|---|---|
| hidden_size | 32, 64, 128 |
| num_layers | 1-3 |
| dropout | 0.1-0.3(多层时) |
| bidirectional | True/False |
代码模板
python
class MyLSTM(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.lstm = nn.LSTM(
input_size=input_size,
hidden_size=hidden_size,
num_layers=2,
dropout=0.2,
bidirectional=False,
batch_first=True
)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
out, (h_n, c_n) = self.lstm(x)
out = self.fc(out[:, -1, :]) # 最后时间步
return out
总结
序列模型是处理时序数据的核心技术:
- RNN:简单但容易梯度消失
- LSTM:解决长期依赖问题
- 双向 LSTM:同时利用前后文信息