RuntimeError:shape ‘[4, 98304]’ is invalid for input of size 113216

Question

我正在学习训练用于图像分类的基本 nn 模型，当我尝试将图像数据输入模型时发生错误。我知道我应该输入正确大小的图像数据。我的图像数据是128*256，3通道，4类，batch size是4。我不明白的是113216这个尺寸是从哪里来的？我检查了所有相关参数或图像元数据，但没有找到任何线索。这是我的代码：

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(3*128*256, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(4, 3*128*256)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()
for epoch in range(2):  # loop over the dataset multiple times
    print('round start')
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        
        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        print(inputs.shape)
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

感谢您的帮助！

Answer 1

形状

Conv2d 在没有 padding 的情况下更改图像的宽度和高度。经验法则（如果你想与 stride=1（默认）保持相同的图像大小：padding = kernel_size // 2
您正在更改频道数量，而您的 linear 图层出于某种原因 3？
如果您想知道张量数据是如何转换的，请在每个步骤后使用print(x.shape)！

注释代码

修复了代码，在每一步后添加了关于形状的注释：

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = torch.nn.Conv2d(3, 6, 5)
        self.pool = torch.nn.MaxPool2d(2, 2)
        self.conv2 = torch.nn.Conv2d(6, 16, 5)
        # Output shape from convolution is input shape to fc
        self.fc1 = torch.nn.Linear(16 * 29 * 61, 120)
        self.fc2 = torch.nn.Linear(120, 84)
        self.fc3 = torch.nn.Linear(84, 10)

    def forward(self, x):
        # In: (4, 3, 128, 256)
        x = F.relu(self.conv1(x))
        # (4, 3, 124, 252) because kernel_size=5 takes 2 pixels
        x = self.pool(x)
        # (4, 6, 62, 126) # Because pooling halving the size
        x = F.relu(self.conv2(x))
        # (4, 16, 58, 122) # Same reason as above
        x = self.pool(x)
        # (4, 16, 29, 61) Because pooling halving the size
        # Better use torch.flatten(x, dim=1) so you don't have to input size here
        x = x.view(-1, 16 * 29 * 61)  # Use -1 to be batch size independent
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

其他可能有帮助的事情

在ReLU之前尝试torch.nn.AdaptiveMaxPool2d(1)，它会让你的网络宽度和高度独立
在此池化后使用 flatten（或 torch.nn.Flatten() 层）
如果是这样，将最后一个卷积中的num_channels设置为in_features for nn.Linear

RuntimeError:shape ‘[4, 98304]’ is invalid for input of size 113216

RuntimeError:shape ‘[4, 98304]’ is invalid for input of size 113216

image

conv-neural-network

pytorch

形状

注释代码

其他可能有帮助的事情