为什么我的CIFAR 100 CNN模型主要预测两个类?

Why does my CIFAR 100 CNN model mainly predict two classes?

我目前正在尝试使用 Keras 在 CIFAR 100 上获得不错的分数(> 40% 的准确率)。但是,我遇到了 CNN 模型的奇怪行为:它倾向于预测一些 class es (2 - 5) 比其他人更频繁:

位置 (i, j) 处的像素包含来自 class i 的验证集中有多少元素被预测为 class j 的计数。因此对角线包含正确的 classifications,其他一切都是错误的。两条竖线表示该模型经常预测那些 classes,尽管事实并非如此。

CIFAR 100 是完美平衡的:所有 100 classes 都有 500 个训练样本。

为什么模型倾向于比其他 classes 更频繁地预测某些 classes?如何解决?

密码

运行这需要一段时间。

#!/usr/bin/env python

from __future__ import print_function
from keras.datasets import cifar100
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from sklearn.model_selection import train_test_split
import numpy as np

batch_size = 32
nb_classes = 100
nb_epoch = 50
data_augmentation = True

# input image dimensions
img_rows, img_cols = 32, 32
# The CIFAR10 images are RGB.
img_channels = 3

# The data, shuffled and split between train and test sets:
(X, y), (X_test, y_test) = cifar100.load_data()
X_train, X_val, y_train, y_val = train_test_split(X, y,
                                                  test_size=0.20,
                                                  random_state=42)

# Shuffle training data
perm = np.arange(len(X_train))
np.random.shuffle(perm)
X_train = X_train[perm]
y_train = y_train[perm]

print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_val.shape[0], 'validation samples')
print(X_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
Y_val = np_utils.to_categorical(y_val, nb_classes)

model = Sequential()

model.add(Convolution2D(32, 3, 3, border_mode='same',
                        input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_val /= 255
X_test /= 255

if not data_augmentation:
    print('Not using data augmentation.')
    model.fit(X_train, Y_train,
              batch_size=batch_size,
              nb_epoch=nb_epoch,
              validation_data=(X_val, y_val),
              shuffle=True)
else:
    print('Using real-time data augmentation.')
    # This will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images

    # Compute quantities required for featurewise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(X_train)

    # Fit the model on the batches generated by datagen.flow().
    model.fit_generator(datagen.flow(X_train, Y_train,
                                     batch_size=batch_size),
                        samples_per_epoch=X_train.shape[0],
                        nb_epoch=nb_epoch,
                        validation_data=(X_val, Y_val))
    model.save('cifar100.h5')

可视化代码

#!/usr/bin/env python


"""Analyze a cifar100 keras model."""

from keras.models import load_model
from keras.datasets import cifar100
from sklearn.model_selection import train_test_split
import numpy as np
import json
import io
import matplotlib.pyplot as plt
try:
    to_unicode = unicode
except NameError:
    to_unicode = str

n_classes = 100


def plot_cm(cm, zero_diagonal=False):
    """Plot a confusion matrix."""
    n = len(cm)
    size = int(n / 4.)
    fig = plt.figure(figsize=(size, size), dpi=80, )
    plt.clf()
    ax = fig.add_subplot(111)
    ax.set_aspect(1)
    res = ax.imshow(np.array(cm), cmap=plt.cm.viridis,
                    interpolation='nearest')
    width, height = cm.shape
    fig.colorbar(res)
    plt.savefig('confusion_matrix.png', format='png')

# Load model
model = load_model('cifar100.h5')

# Load validation data
(X, y), (X_test, y_test) = cifar100.load_data()

X_train, X_val, y_train, y_val = train_test_split(X, y,
                                                  test_size=0.20,
                                                  random_state=42)

# Calculate confusion matrix
y_val_i = y_val.flatten()
y_val_pred = model.predict(X_val)
y_val_pred_i = y_val_pred.argmax(1)
cm = np.zeros((n_classes, n_classes), dtype=np.int)
for i, j in zip(y_val_i, y_val_pred_i):
    cm[i][j] += 1

acc = sum([cm[i][i] for i in range(100)]) / float(cm.sum())
print("Validation accuracy: %0.4f" % acc)

# Create plot
plot_cm(cm)

# Serialize confusion matrix
with io.open('cm.json', 'w', encoding='utf8') as outfile:
    str_ = json.dumps(cm.tolist(),
                      indent=4, sort_keys=True,
                      separators=(',', ':'), ensure_ascii=False)
    outfile.write(to_unicode(str_))

红鲱鱼

tanh

我已将 tanh 替换为 reluhistory csv 看起来不错,但可视化有同样的问题:

另请注意,此处的验证准确率仅为 3.44%。

Dropout + tanh + border 模式

移除 dropout,用 relu 替换 tanh,将边框模式设置为相同:history csv

可视化代码的准确率(这次为 8.50%)仍然低于 keras 训练代码。

问答

以下为评论汇总:

我对这部分代码感觉不太好:

model.add(Dense(1024))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

剩下的模型全是relus,但是这里有一个tanh

tanh 有时会消失或爆炸(在 -1 和 1 处饱和),这可能会导致您的 2-class 过于重要。

keras-example cifar 10 basically uses the same architecture (dense-layer sizes might be different), but also uses a relu there (no tanh at all). The same goes for this external keras-based cifar 100 code.

  1. 我没看到你在做均值中心化,即使在数据生成器中也是如此。我怀疑这是主要原因。要使用 ImageDataGenerator 进行均值居中,请设置 featurewise_center = 1。另一种方法是从每个 RGB 像素中减去 ImageNet 均值。要减去的平均向量是 [103.939, 116.779, 123.68].

  2. 进行所有激活 relu,除非您有特定原因需要激活一个 tanh

  3. 删除两个 0.25 的 dropout,看看会发生什么。如果要对卷积层应用dropouts,最好使用SpatialDropout2D。它以某种方式从 Keras 在线文档中删除,但您可以在 source.

  4. 中找到它
  5. 您有两个 convsame 和两个 valid 层。这没有任何问题,但是将所有 conv 层保留为 same 并仅根据最大池化来控制大小会更简单。

问题的一个重要部分是我的 ~/.keras/keras.json

{
    "image_dim_ordering": "th",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}

因此我不得不将 image_dim_ordering 更改为 tf。这导致

准确率为 12.73%。显然,仍然存在问题,因为 validation history 给出了 45.1% 的准确率。

如果您在训练和验证期间获得了良好的准确性,但在测试时却没有,请确保在这两种情况下对数据集进行完全相同的预处理。 训练时有:

X_train /= 255
X_val /= 255
X_test /= 255

但是在预测你的混淆矩阵时没有这样的代码。添加到测试:

X_val /=  255.

给出以下漂亮的混淆矩阵: