全连接 1 (fc1) 层的输入展平值来自哪里(MNIST 示例)

Where do the input flatten value come from in fully connected 1 (fc1) layer (MNIST Example)

这是 Pytorch 示例目录中的一些卷积神经网络示例代码 github: https://github.com/pytorch/examples/blob/master/mnist/main.py

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout2d(0.25)
        self.dropout2 = nn.Dropout2d(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

如果我理解这一点,我们需要先将最后一个卷积层的输出展平,然后才能将其传递给线性层 (fc1)。所以,看这段代码,我们看到第一个全连接层的输入是:9216.

这个数字 (9216) 从哪里来的?

您还需要查看 forward 方法和网络输入形状,以便计算 linear/fully-connected 层的输入形状。对于 MNIST,我们有一个单通道 28x28 输入图像。使用 docs you can compute the output shape of each convolution operation. The max-pooling 操作中的以下公式遵循与卷积层相同的输入-输出关系。

由于展平前输入的形状是一个64通道的12x12特征图,那么特征总大小为64*12*12 = 9216.

Input/Output conv2d 和 max_pool2d 操作的关系

def forward(self, x):
    """ For each line which changes the feature shape additional comment
        indicates <input_shape> -> <output_shape> """
    x = self.conv1(x)                # [1, 28, 28] -> [32, 26, 26]
    x = F.relu(x)
    x = self.conv2(x)                # [32, 26, 26] -> [64, 24, 24]
    x = F.relu(x)
    x = F.max_pool2d(x, 2)           # [64, 24, 24] -> [64, 12, 12]
    x = self.dropout1(x)
    x = torch.flatten(x, 1)          # [64, 12, 12] -> [9216]
    x = self.fc1(x)                  # [9216] -> [128]
    x = F.relu(x)
    x = self.dropout2(x)
    x = self.fc2(x)                  # [128] -> [10]
    output = F.log_softmax(x, dim=1)
    return output