PyTorch 学习笔记 (5): 多层神经网络 (MLP)

多层神经网络（Multi-Layer Perceptron, MLP）也叫全连接神经网络。

为什么需要多层？

单个神经元只能解决线性可分问题
多层 + 非线性激活函数可以学习任意复杂模式

MLP 结构：

text

输入层 → 隐藏层1 → 隐藏层2 → ... → 输出层
           ↓ReLU      ↓ReLU         ↓(无/softmax)

为什么需要非线性激活函数？

XOR 问题

XOR 是经典的非线性可分问题，线性模型无法解决：

text

XOR 真值表：
  输入 (0,0) → 输出 0
  输入 (0,1) → 输出 1
  输入 (1,0) → 输出 1
  输入 (1,1) → 输出 0

线性模型会失败

python

import torch
import torch.nn as nn
import torch.optim as optim

# XOR 数据
X_xor = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)
y_xor = torch.tensor([[0], [1], [1], [0]], dtype=torch.float32)

# 线性模型（无法解决 XOR）
class LinearXOR(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(2, 1)

    def forward(self, x):
        return torch.sigmoid(self.linear(x))

linear_model = LinearXOR()
criterion = nn.BCELoss()
optimizer = optim.SGD(linear_model.parameters(), lr=0.1)

for epoch in range(1000):
    optimizer.zero_grad()
    y_pred = linear_model(X_xor)
    loss = criterion(y_pred, y_xor)
    loss.backward()
    optimizer.step()

# 线性模型准确率最高只有 75%（无法完美解决）

使用多层神经网络

python

class MLP(nn.Module):
    """
    多层神经网络

    结构：
        输入 (2)
          ↓
        隐藏层1 (8个神经元) + ReLU
          ↓
        隐藏层2 (8个神经元) + ReLU
          ↓
        输出层 (1个神经元) + Sigmoid
    """

    def __init__(self, input_dim=2, hidden_dim=8, output_dim=1):
        super().__init__()

        # 第一层：输入 -> 隐藏层1
        self.fc1 = nn.Linear(input_dim, hidden_dim)

        # 第二层：隐藏层1 -> 隐藏层2
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)

        # 输出层：隐藏层2 -> 输出
        self.fc3 = nn.Linear(hidden_dim, output_dim)

        # 激活函数
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # 第一隐藏层
        x = self.fc1(x)
        x = self.relu(x)

        # 第二隐藏层
        x = self.fc2(x)
        x = self.relu(x)

        # 输出层
        x = self.fc3(x)
        x = self.sigmoid(x)

        return x

# 创建 MLP 模型
mlp = MLP(input_dim=2, hidden_dim=8, output_dim=1)

# 训练 MLP
criterion = nn.BCELoss()
optimizer = optim.Adam(mlp.parameters(), lr=0.01)

epochs = 2000
for epoch in range(epochs):
    optimizer.zero_grad()
    y_pred = mlp(X_xor)
    loss = criterion(y_pred, y_xor)
    loss.backward()
    optimizer.step()

# MLP 可以完美解决 XOR 问题（准确率 100%）

更复杂的分类问题：螺旋数据

python

import numpy as np

def generate_spiral_data(n_samples=300):
    """生成两类螺旋数据"""
    np.random.seed(42)

    # 第一类：向内螺旋
    theta1 = np.linspace(0, 4*np.pi, n_samples//2)
    r1 = theta1 + np.random.randn(n_samples//2) * 0.2
    x1 = r1 * np.cos(theta1)
    y1 = r1 * np.sin(theta1)

    # 第二类：向外螺旋
    theta2 = np.linspace(0, 4*np.pi, n_samples//2)
    r2 = theta2 + np.random.randn(n_samples//2) * 0.2
    x2 = -r2 * np.cos(theta2)
    y2 = -r2 * np.sin(theta2)

    # 合并数据
    X = np.vstack([np.column_stack([x1, y1]), np.column_stack([x2, y2])])
    y = np.hstack([np.zeros(n_samples//2), np.ones(n_samples//2)])

    return torch.tensor(X, dtype=torch.float32), torch.tensor(y, dtype=torch.float32).unsqueeze(1)

X_spiral, y_spiral = generate_spiral_data(300)

更深的网络

python

class DeepMLP(nn.Module):
    """更深的网络用于解决复杂分类问题"""
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(2, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 64)
        self.fc4 = nn.Linear(64, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.relu(self.fc3(x))
        x = self.sigmoid(self.fc4(x))
        return x

deep_mlp = DeepMLP()
criterion = nn.BCELoss()
optimizer = optim.Adam(deep_mlp.parameters(), lr=0.01)

epochs = 1000
for epoch in range(epochs):
    optimizer.zero_grad()
    y_pred = deep_mlp(X_spiral)
    loss = criterion(y_pred, y_spiral)
    loss.backward()
    optimizer.step()

常用激活函数对比

激活函数	公式	特点
Sigmoid	1/(1+e^(-x))	输出(0,1)，用于输出层
ReLU	max(0, x)	最常用，解决梯度消失
Tanh	(e^x-e^(-x))/(e^x+e^(-x))	输出(-1,1)
Softmax	exp(x)/Σexp(x)	用于多分类输出

网络设计原则

隐藏层数：2-4 层通常足够处理大多数问题
隐藏层宽度：32, 64, 128, 256 等常见值
输出层激活函数：
- 二分类：Sigmoid
- 多分类：Softmax
- 回归：无激活函数

训练技巧

使用 Adam 优化器（比 SGD 收敛更快）
学习率：0.001 (Adam), 0.01 (SGD)
训练足够多的 epoch 让损失收敛

总结

为什么需要多层？
- 单层神经网络 = 线性模型，只能解决线性可分问题
- 多层 + 非线性激活 = 可以逼近任意函数（通用近似定理）
核心要点：
- 非线性激活函数是关键
- ReLU 是最常用的激活函数
- 更深的网络可以学习更复杂的模式

上一篇：PyTorch 学习笔记 (4): 逻辑回归

下一篇：PyTorch 学习笔记 (6): 卷积神经网络 (CNN)