了解 Conv2d 的输入和输出大小

Question

我正在使用 PyTorch 学习图像分类（使用 CIFAR-10 数据集）following this link。

我正在尝试理解给定 Conv2d 代码的输入和输出参数：

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

我的conv2d()理解（如有wrong/missing请指正）：

因为图像有 3 个通道，所以第一个参数是 3。 6 没有过滤器（随机选择）
5 是内核大小 (5, 5)（随机选择）
同样我们创建下一层（上一层输出是这一层的输入）
现在使用 linear 函数创建一个全连接层： self.fc1 = nn.Linear(16 * 5 * 5, 120)

16 * 5 * 5：这里16是最后一个conv2d层的输出，但是5 * 5在这里面是什么？

这是内核大小吗？或者是其他东西？如何知道我们需要乘以 5*5 or 4*4 or 3*3.....

我研究并了解到，由于图像大小为 32*32，应用 max pool(2) 2 次，因此图像大小为 32 -> 16 -> 8，因此我们应该将其乘以last_ouput_size * 8 * 8 但在这个 link 它 5*5.

谁能解释一下？

Answer 1

这些是图像尺寸本身的尺寸（即高 x 宽）。

未填充的卷积

除非您用零填充图像，否则卷积过滤器会将输出图像的大小在高度和宽度上缩小 filter_size - 1：


3-filter takes a 5x5 image to a (5-(3-1) x 5-(3-1)) image	Zero padding preserves image dimensions

您可以通过设置 Conv2d(padding=...) 在 Pytorch 中添加填充。

转换链

已通过:

Layer	Shape Transformation
one conv layer (without padding)	`(h, w) -> (h-4, w-4)`
a MaxPool	`-> ((h-4)//2, (w-4)//2)`
another conv layer (without padding)	`-> ((h-8)//2, (w-8)//2)`
another MaxPool	`-> ((h-8)//4, (w-8)//4)`
a Flatten	`-> ((h-8)//4 * (w-8)//4)`

我们从 (32,32) 到 (28,28) 到 (14,14) 到 (10,10) 到 (5,5) 到 (5x5) 的原始图像大小。

要将其可视化，您可以使用 torchsummary 包：

from torchsummary import summary

input_shape = (3,32,32)
summary(Net(), input_shape)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [-1, 6, 28, 28]             456
         MaxPool2d-2            [-1, 6, 14, 14]               0
            Conv2d-3           [-1, 16, 10, 10]           2,416
         MaxPool2d-4             [-1, 16, 5, 5]               0
            Linear-5                  [-1, 120]          48,120
            Linear-6                   [-1, 84]          10,164
            Linear-7                   [-1, 10]             850
================================================================

了解 Conv2d 的输入和输出大小

Understanding input and output size for Conv2d

python

deep-learning

conv-neural-network

pytorch

未填充的卷积

转换链