批量大小会降低预训练 CNN 的整体精度

Batch size reduces accuracy of ensemble of pretrained CNNs

我正在尝试实现基于 softmax 的基本投票,我采用了几个预训练的 CNN,对它们的输出进行 softmax,将它们加在一起,然后使用 argmax 作为最终输出。

所以我从 "chenyaofo/pytorch-cifar-models" 中加载了 4 个不同的预训练 CNN(vgg11vgg13vgg16vgg19)——我没有训练他们。

怎么可能?

这是代码:

import torch
from tqdm import tqdm
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
import torch.nn as nn
import torch

torch.cuda.empty_cache()

model_names = [
        "cifar10_vgg11_bn",
        "cifar10_vgg13_bn",
        "cifar10_vgg16_bn",
        "cifar10_vgg19_bn",
        # "cifar10_resnet56",
]

batch_size = 2

test_transform = transforms.Compose([
                    transforms.ToTensor(),
])

def load_models():
    models = []
    for model_name in model_names:
        model = torch.hub.load("chenyaofo/pytorch-cifar-models", model_name, pretrained=True)
        models.append(model)
    return models

testset = datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=test_transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False)

import torch.nn as nn
import torch

class MyEnsemble(nn.Module):

    def __init__(self, modelA, modelB, modelC, modelD):
        super(MyEnsemble, self).__init__()
        self.modelA = modelA
        self.modelB = modelB
        self.modelC = modelC
        self.modelD = modelD
        # self.modelE = modelE

    def forward(self, x):
        out1 = self.modelA(x)
        out2 = self.modelB(x)
        out3 = self.modelC(x)
        out4 = self.modelD(x)
        # out5 = self.modelE(x)

        # print(out1.shape)

        out1 = torch.softmax(out1, dim=1)
        out2 = torch.softmax(out2, dim=1)
        out3 = torch.softmax(out3, dim=1)
        out4 = torch.softmax(out4, dim=1)

        out = out1 + out2 + out3 + out4

        return out

from EnsembleModule import MyEnsemble
from data import load_models, testloader
import torch
from tqdm import tqdm

device = 'cuda' if torch.cuda.is_available() else 'cpu'

models = load_models()

model = MyEnsemble(models[0], models[1], models[2], models[3])

model.to(device)

total = 0
correct = 0
with torch.no_grad():
    for images, labels in tqdm(testloader):
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predictions = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predictions == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))


您忘记拨打 model.eval():

# ...

model.to(device)
model.eval() # <<<<<<<<<<<<<

total = 0
correct = 0
with torch.no_grad():
    for images, labels in tqdm(testloader):
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)

# ...

由于您的模型有 BatchNorm 层,batch_size=1 的性能特别差。

预处理也应该遵循用于训练的预处理。正如您在 repository of the author of the model, you should normalize using the following statistics:

中看到的
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.2023, 0.1994, 0.2010))
])

您正在使用包含 batchnorm 层的模型(由 torchvision 模型名称中的 _bn 后缀表示)。

这反过来意味着结果将取决于当前批次的统计信息。这些在使用 batch_size=2batch_size=128 时是不同的。评估时,您应该始终致电 nn.Module.eval function. This makes the layer use running statistics (those learned during training) and not the batch's statistics. Read 以获取更多信息。

请注意调用 eval 将递归传播到所有子模块,因此您只需直接对 yoru 集成模块进行一次调用:

model = MyEnsemble(models[0], models[1], models[2], models[3])
model.eval()

完成后,批量大小应该不会影响模型的性能。

训练时,您需要使用 nn.Module.train 重新开启训练模式。


您需要根据dataset's statistics对数据进行标准化,您可以在torchvision预处理管道中进行:

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))]))