Cifar100 只有 16 个训练图像和 16 个训练标签

Question

我在 Python 3.7 中使用 Tensorflow，我正在尝试使用 CIFAR-100 制作图像分类器。我想尽可能远离 Keras，因为它只有有限数量的数据集可供我使用。这是我的代码：

import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
import matplotlib.pyplot as plt
import PIL.Image as Image
from tensorflow import keras

tf.compat.v1.enable_eager_execution()

shape = (224, 224)

labels = '/home/pi/tf/cifar_labels.txt'
labels = np.array(open(labels).read().splitlines())

img = '/home/pi/tf/lobster.jpeg'
img = Image.open(img).resize(shape)
img = np.array(img)/255.0
img = np.reshape(img, (224, 224, 3))

train = tfds.load(name="cifar100", split="train")
test = tfds.load(name="cifar100", split="test")

train = train.shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
test = test.shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE)

for features in train:
    train_images, train_labels = features["image"], features["label"]

for features in test:
    test_images, test_labels = features["image"], features["label"]

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(32, 32, 3)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(100, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(train_images, train_labels, epochs=200, verbose=2)

test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

print('\nTest accuracy:', test_acc)

我猜 for features in train for 循环有问题。当我打印训练 images/labels 的 len 时，我得到 16。因此，我的模型的训练精度为 0%，损失为 16.1181%。有人可以帮忙吗？

Answer 1

要在您的 keras 模型中直接使用 CIFAR-100，您应该使用 as_supervised=True 参数调用 tfds.load 函数。然后它将仅使用 'image' 和 'label' 键加载数据集。 You can see that CIFAR-100 dataset contains three keys:

FeaturesDict({
    'coarse_label': ClassLabel(shape=(), dtype=tf.int64, num_classes=20),
    'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=100),
})

因此不能直接送入model.fit()。将 as_supervised 设置为 True，返回的数据集将仅包含 (u'image', u'label') 个键。

综上所述，

import tensorflow_datasets as tfds
from tensorflow import keras

tf.compat.v1.enable_eager_execution()

train= tfds.load(name="cifar100", split="train", as_supervised=True)
test = tfds.load(name="cifar100", split="test", as_supervised=True)


train = train.shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
test = test.shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE)

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(32, 32, 3)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(100, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(train, epochs=200, verbose=1)

test_loss, test_acc = model.evaluate(test, verbose=1)

print('\nTest accuracy:', test_acc)

注意：要使用未将 as_supervised 设置为 True 的数据集，您可以使用 model.train_on_batch 函数。例如

import tensorflow_datasets as tfds
from tensorflow import keras

tf.compat.v1.enable_eager_execution()

train= tfds.load(name="cifar100", split="train")
test = tfds.load(name="cifar100", split="test")


train = train.shuffle(1024).repeat(200).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
test = test.shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE)

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(32, 32, 3)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(100, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

for epoch in range(200):
  for features in train:
    image_batch, label_batch = features["image"], features["label"]
    loss, acc = model.train_on_batch(image_batch, label_batch)

for features in test:
  image_batch, label_batch = features["image"], features["label"]
  loss, acc = model.test_on_batch(image_batch, label_batch)

Cifar100 只有 16 个训练图像和 16 个训练标签

Cifar100 only has 16 training images and 16 training labels

python

machine-learning

tensorflow

python-3.7