.evaluate() 和 sklearn classification_report() 之间的损失和准确性差异
Loss and accuracy difference between .evaluate() and sklearn classification_report()
在 tensorflow 中训练模型时,.evaluate()
指标和 sklearn classification_report
之间存在明显差异。在训练模型时,历史记录显示出良好的准确性,这与使用 .evaluate()
时大致相同,但在使用 sklearn 指标时则完全不同。
import tensorflow as tf
import tensorflow_datasets as tfds
from sklearn.metrics import classification_report
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
def normalize_img(image, label):
"""Normalizes images: `uint8` -> `float32`."""
return tf.cast(image, tf.float32) / 255., label
ds_train = ds_train.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)
ds_test = ds_test.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics='accuracy',
)
model.fit(
ds_train,
epochs=6,
validation_data=ds_test,
)
Epoch 1/6
469/469 [==============================] - 1s 3ms/step - loss: 0.3586 - accuracy: 0.9009 - val_loss: 0.1961 - val_accuracy: 0.9435
Epoch 2/6
469/469 [==============================] - 1s 2ms/step - loss: 0.1634 - accuracy: 0.9529 - val_loss: 0.1310 - val_accuracy: 0.9619
Epoch 3/6
469/469 [==============================] - 1s 2ms/step - loss: 0.1142 - accuracy: 0.9676 - val_loss: 0.1089 - val_accuracy: 0.9670
Epoch 4/6
469/469 [==============================] - 1s 2ms/step - loss: 0.0883 - accuracy: 0.9743 - val_loss: 0.0913 - val_accuracy: 0.9721
Epoch 5/6
469/469 [==============================] - 1s 2ms/step - loss: 0.0709 - accuracy: 0.9795 - val_loss: 0.0795 - val_accuracy: 0.9772
Epoch 6/6
469/469 [==============================] - 1s 2ms/step - loss: 0.0590 - accuracy: 0.9826 - val_loss: 0.0762 - val_accuracy: 0.9768
<tensorflow.python.keras.callbacks.History at 0x1a603d02070>
loss, accuracy = model.evaluate(ds_train)
print("Loss:", loss)
print("Accuracy:", accuracy)
469/469 [==============================] - 1s 1ms/step - loss: 0.0484 - accuracy: 0.9867
Loss: 0.04843668267130852
Accuracy: 0.9867166876792908
train_probs = model.predict(ds_train)
train_preds = tf.argmax(train_probs, axis=-1)
train_labels_ds = ds_train.map(lambda image, label: label).unbatch()
y_true = next(iter(train_labels_ds.batch(60000))).numpy()
print(classification_report(y_true, train_preds))
precision recall f1-score support
0 0.10 0.10 0.10 5923
1 0.11 0.11 0.11 6742
2 0.10 0.10 0.10 5958
3 0.10 0.10 0.10 6131
4 0.09 0.09 0.09 5842
5 0.09 0.09 0.09 5421
6 0.10 0.10 0.10 5918
7 0.11 0.11 0.11 6265
8 0.11 0.10 0.10 5851
9 0.11 0.10 0.11 5949
accuracy 0.10 60000
macro avg 0.10 0.10 0.10 60000
weighted avg 0.10 0.10 0.10 60000
如代码所示,差异明显很大,但似乎无法知道问题所在。我也尝试使用 keras 中内置的指标,我得到了与 sklearn 相同的结果。
注:此代码来自tensorflow官方文档tutorial.
尝试将此行更改为:
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples, reshuffle_each_iteration=False)
默认情况下,reshuffle_each_iteration
设置为 True
。因此,即使模型训练得当,也会导致标签和预测不匹配。来自 the docs
reshuffle_each_iteration = A boolean, which if true indicates that
the dataset should be pseudorandomly reshuffled each time it is
iterated over. (Defaults to True.)
编辑 - 另一种方法: 遍历数据集以获取预测值和标签:
train_preds = np.array([])
y_true = np.array([])
for x, y in ds_train:
train_preds = np.concatenate([train_preds,
np.argmax(model(x), axis = -1)])
y_true = np.concatenate([y_true, y.numpy()])
在 tensorflow 中训练模型时,.evaluate()
指标和 sklearn classification_report
之间存在明显差异。在训练模型时,历史记录显示出良好的准确性,这与使用 .evaluate()
时大致相同,但在使用 sklearn 指标时则完全不同。
import tensorflow as tf
import tensorflow_datasets as tfds
from sklearn.metrics import classification_report
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
def normalize_img(image, label):
"""Normalizes images: `uint8` -> `float32`."""
return tf.cast(image, tf.float32) / 255., label
ds_train = ds_train.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)
ds_test = ds_test.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics='accuracy',
)
model.fit(
ds_train,
epochs=6,
validation_data=ds_test,
)
Epoch 1/6
469/469 [==============================] - 1s 3ms/step - loss: 0.3586 - accuracy: 0.9009 - val_loss: 0.1961 - val_accuracy: 0.9435
Epoch 2/6
469/469 [==============================] - 1s 2ms/step - loss: 0.1634 - accuracy: 0.9529 - val_loss: 0.1310 - val_accuracy: 0.9619
Epoch 3/6
469/469 [==============================] - 1s 2ms/step - loss: 0.1142 - accuracy: 0.9676 - val_loss: 0.1089 - val_accuracy: 0.9670
Epoch 4/6
469/469 [==============================] - 1s 2ms/step - loss: 0.0883 - accuracy: 0.9743 - val_loss: 0.0913 - val_accuracy: 0.9721
Epoch 5/6
469/469 [==============================] - 1s 2ms/step - loss: 0.0709 - accuracy: 0.9795 - val_loss: 0.0795 - val_accuracy: 0.9772
Epoch 6/6
469/469 [==============================] - 1s 2ms/step - loss: 0.0590 - accuracy: 0.9826 - val_loss: 0.0762 - val_accuracy: 0.9768
<tensorflow.python.keras.callbacks.History at 0x1a603d02070>
loss, accuracy = model.evaluate(ds_train)
print("Loss:", loss)
print("Accuracy:", accuracy)
469/469 [==============================] - 1s 1ms/step - loss: 0.0484 - accuracy: 0.9867
Loss: 0.04843668267130852
Accuracy: 0.9867166876792908
train_probs = model.predict(ds_train)
train_preds = tf.argmax(train_probs, axis=-1)
train_labels_ds = ds_train.map(lambda image, label: label).unbatch()
y_true = next(iter(train_labels_ds.batch(60000))).numpy()
print(classification_report(y_true, train_preds))
precision recall f1-score support
0 0.10 0.10 0.10 5923
1 0.11 0.11 0.11 6742
2 0.10 0.10 0.10 5958
3 0.10 0.10 0.10 6131
4 0.09 0.09 0.09 5842
5 0.09 0.09 0.09 5421
6 0.10 0.10 0.10 5918
7 0.11 0.11 0.11 6265
8 0.11 0.10 0.10 5851
9 0.11 0.10 0.11 5949
accuracy 0.10 60000
macro avg 0.10 0.10 0.10 60000
weighted avg 0.10 0.10 0.10 60000
如代码所示,差异明显很大,但似乎无法知道问题所在。我也尝试使用 keras 中内置的指标,我得到了与 sklearn 相同的结果。
注:此代码来自tensorflow官方文档tutorial.
尝试将此行更改为:
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples, reshuffle_each_iteration=False)
默认情况下,reshuffle_each_iteration
设置为 True
。因此,即使模型训练得当,也会导致标签和预测不匹配。来自 the docs
reshuffle_each_iteration = A boolean, which if true indicates that the dataset should be pseudorandomly reshuffled each time it is iterated over. (Defaults to True.)
编辑 - 另一种方法: 遍历数据集以获取预测值和标签:
train_preds = np.array([])
y_true = np.array([])
for x, y in ds_train:
train_preds = np.concatenate([train_preds,
np.argmax(model(x), axis = -1)])
y_true = np.concatenate([y_true, y.numpy()])