PyTorch 学习笔记 (8): 序列模型 (RNN/LSTM)

序列数据的特点

数据有先后顺序
当前时刻的输入可能与之前的信息有关

常见序列数据：

文本（单词序列）
语音（声波序列）
股票价格（时间序列）
视频（帧序列）

RNN vs LSTM

RNN：简单的循环神经网络，容易梯度消失
LSTM：长短期记忆网络，解决梯度消失问题

RNN 核心思想

text

h_t = tanh(W_h * h_{t-1} + W_x * x_t + b)
y_t = W_y * h_t + c

引入"隐藏状态"（hidden state）
将上一步的信息传递到当前步骤
问题：梯度消失，无法学习长期依赖

LSTM 改进

引入"门控机制"（遗忘门、输入门、输出门）
可以选择性保留或遗忘信息
能够学习长期依赖

创建序列数据

python

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# 生成正弦波数据
def create_sine_wave(seq_length=1000):
    t = np.linspace(0, 4*np.pi, seq_length)
    data = np.sin(t) + 0.1 * np.random.randn(seq_length)
    return torch.tensor(data, dtype=torch.float32)

data = create_sine_wave(1000)

# 准备训练数据
def create_sequences(data, seq_len, pred_len):
    """将序列数据转换为训练样本"""
    X, y = [], []
    for i in range(len(data) - seq_len - pred_len):
        X.append(data[i:i+seq_len])
        y.append(data[i+seq_len:i+seq_len+pred_len])
    return torch.stack(X), torch.stack(y)

# 使用前50个时间步预测后10个时间步
seq_input_len = 50
seq_output_len = 10

X, y = create_sequences(data, seq_input_len, seq_output_len)

# 划分训练集和测试集
train_size = int(0.8 * len(X))
X_train, y_train = X[:train_size], y[:train_size]
X_test, y_test = X[train_size:], y[train_size:]

定义 RNN 模型

python

class SimpleRNN(nn.Module):
    """简单的 RNN 模型"""
    def __init__(self, input_size=1, hidden_size=32, output_size=10):
        super().__init__()
        self.hidden_size = hidden_size
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # x shape: (batch, seq_len, input_size)
        out, h_n = self.rnn(x)
        # 使用最后一个时间步的隐藏状态
        out = self.fc(out[:, -1, :])
        return out

定义 LSTM 模型

python

class LSTMModel(nn.Module):
    """LSTM 模型"""
    def __init__(self, input_size=1, hidden_size=32, output_size=10):
        super().__init__()
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, (h_n, c_n) = self.lstm(x)
        out = self.fc(out[:, -1, :])
        return out

双向 LSTM

python

class BidirectionalLSTM(nn.Module):
    """双向 LSTM"""
    def __init__(self, input_size=1, hidden_size=32, output_size=10):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size,
                           batch_first=True,
                           bidirectional=True)  # 双向
        # 双向输出维度是 2 * hidden_size
        self.fc = nn.Linear(hidden_size * 2, output_size)

    def forward(self, x):
        out, (h_n, c_n) = self.lstm(x)
        out = self.fc(out[:, -1, :])
        return out

训练模型

python

def train_model(model, model_name, X_train, y_train, X_test, y_test, epochs=100):
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.01)

    train_losses = []
    test_losses = []

    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()

        y_pred = model(X_train.unsqueeze(-1))  # 添加 feature 维度
        loss = criterion(y_pred, y_train)

        loss.backward()
        optimizer.step()

        train_losses.append(loss.item())

        # 评估
        model.eval()
        with torch.no_grad():
            test_loss = criterion(model(X_test.unsqueeze(-1)), y_test).item()
            test_losses.append(test_loss)

    return train_losses, test_losses

# 训练所有模型
models = {
    'RNN': SimpleRNN(),
    'LSTM': LSTMModel(),
    'BiLSTM': BidirectionalLSTM()
}

for name, model in models.items():
    train_losses, test_losses = train_model(
        model, name, X_train, y_train, X_test, y_test, epochs=100
    )

RNN vs LSTM 对比

特性	RNN	LSTM
结构复杂度	简单	复杂（有门控）
参数量	少	多
训练速度	快	慢
长期依赖	差（梯度消失）	好
适用场景	短序列	长序列

LSTM 的门控机制

遗忘门 (Forget Gate)：决定丢弃什么信息
text
```
f_t = σ(W_f · [h_{t-1}, x_t] + b_f)
```
输入门 (Input Gate)：决定存储什么信息
text
```
i_t = σ(W_i · [h_{t-1}, x_t] + b_i)
```
输出门 (Output Gate)：决定输出什么信息
text
```
o_t = σ(W_o · [h_{t-1}, x_t] + b_o)
```

使用建议

短序列（小于 20 步）：RNN 足够
中长序列（20-100 步）：LSTM
超长序列（大于 100 步）：LSTM + 注意力机制
需要上下文：使用双向 LSTM

常用参数设置

参数	推荐值
hidden_size	32, 64, 128
num_layers	1-3
dropout	0.1-0.3（多层时）
bidirectional	True/False

代码模板

python

class MyLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=2,
            dropout=0.2,
            bidirectional=False,
            batch_first=True
        )
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, (h_n, c_n) = self.lstm(x)
        out = self.fc(out[:, -1, :])  # 最后时间步
        return out

总结

序列模型是处理时序数据的核心技术：

RNN：简单但容易梯度消失
LSTM：解决长期依赖问题
双向 LSTM：同时利用前后文信息

上一篇：PyTorch 学习笔记 (7): 正则化技巧

下一篇：PyTorch 学习笔记 (9): 模型管理