为什么 fit() 期间的训练集准确度与对相同数据使用预测后立即计算的准确度不同？

Question

已经在 Tensorflow - Keras 中编写了一个基本的深度学习模型。

为什么训练结束时报告的训练集准确度 (0.4097) 与使用预测函数（或使用评估，给出相同的结果）对相同训练数据直接计算后直接报告的结果不同数) = 0.6463?

下面是MWE；之后直接输出。

from extra_keras_datasets import kmnist
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization
import numpy as np


# Model configuration
no_classes = 10


# Load KMNIST dataset
(input_train, target_train), (input_test, target_test) = kmnist.load_data(type='kmnist')

# Shape of the input sets
input_train_shape = input_train.shape
input_test_shape = input_test.shape 

# Keras layer input shape
input_shape = (input_train_shape[1], input_train_shape[2], 1)



# Reshape the training data to include channels
input_train = input_train.reshape(input_train_shape[0], input_train_shape[1], input_train_shape[2], 1)
input_test = input_test.reshape(input_test_shape[0], input_test_shape[1], input_test_shape[2], 1)


# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')

# Normalize input data
input_train = input_train / 255
input_test = input_test / 255


# Create the model
model = Sequential()
model.add(Flatten(input_shape=input_shape))
model.add(Dense(no_classes, activation='softmax'))


# Compile the model
model.compile(loss=tensorflow.keras.losses.sparse_categorical_crossentropy,
              optimizer=tensorflow.keras.optimizers.Adam(),
              metrics=['accuracy'])


# Fit data to model
history = model.fit(input_train, target_train,
            batch_size=2000,
            epochs=1,
            verbose=1)

prediction = model.predict(input_train)
print("Prediction accuracy = ", np.mean( np.argmax(prediction, axis=1) == target_train))

model.evaluate(input_train, target_train, verbose=2)

最后几行输出：

30/30 [==============================] - 0s 3ms/step - loss: 1.8336 - accuracy: 0.4097
Prediction accuracy =  0.6463166666666667
1875/1875 - 1s - loss: 1.3406 - accuracy: 0.6463

编辑.

下面的初步答案解决了我的第一个问题，指出当你只有运行 1 个 epoch 时批量大小很重要。当运行ning 小批量（或批量大小 = 1）或更多时期时，您可以将 post 拟合预测精度推到非常接近拟合本身抛出的最终精度。哪个好！

我最初问这个问题是因为我在处理更复杂的模型时遇到了问题。

我仍然无法理解在这种情况下发生的事情（是的，它涉及批量归一化）。要获得我的 MWE，请将上面 'create the model' 下面的所有内容替换为下面的代码，以实现一些具有批量归一化的完全连接层。

当你运行这样的两个 epoch 时 - 你会看到所有 30 个小批次的准确度非常稳定（30 个是因为训练集中的 60,000 除以每批次的 2000）。我看到整个第二个训练阶段的准确率非常一致，达到 83%。

但是在这样做之后拟合后的预测是糟糕的 10% 左右。谁能解释一下？

model = Sequential()
model.add(Dense(50, activation='relu', input_shape = input_shape))
model.add(BatchNormalization())
model.add(Dense(20, activation='relu'))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(no_classes, activation='softmax'))


# Compile the model
model.compile(loss=tensorflow.keras.losses.sparse_categorical_crossentropy,
              optimizer=tensorflow.keras.optimizers.Adam(),
              metrics=['accuracy'])


# Fit data to model
history = model.fit(input_train, target_train,
            batch_size=2000,
            epochs=2,
            verbose=1)

prediction = model.predict(input_train)

print("Prediction accuracy = ", np.mean( np.argmax(prediction, axis=1) == target_train))

model.evaluate(input_train, target_train, verbose=2, batch_size = batch_size)

30/30 [==============================] - 46s 2s/step - loss: 0.5567 - accuracy: 0.8345
Prediction accuracy =  0.10098333333333333

Answer 1

发生这种情况的一个原因是，报告的最后准确度考虑了整个纪元，其参数不是常数，并且仍在优化中。

评估模型时，参数停止变化，它们保持在最终（希望是最优化的）状态。与上一个时代不同，参数处于各种（希望优化程度较低）状态，在时代开始时更是如此。

已删除，因为我现在看到您在这种情况下没有使用批归一化。

我假设这是由于 BatchNormalization。

参见示例here

在训练过程中，使用移动平均值。

在推理过程中，我们已经有了归一化参数

这可能是造成差异的原因。

请尝试不使用它，看看是否仍然存在如此巨大的差异。

Answer 2

只是添加到@Gulzar 的回答：这个效果可以非常明显，因为 OP 只使用了一个 epoch（很多参数在训练的一开始就在变化），评估方法中的批大小不相等（默认为 32） ) 和 fit 方法，批量大小比整个数据小很多（意味着每个时期都有大量更新）。

只需在同一个实验中增加更多的时期就会减弱这种影响。

# Fit data to model
history = model.fit(input_train, target_train,
            batch_size=2000,
            epochs=40,
            verbose=1)

结果

Epoch 40/40
30/30 [==============================] - 0s 11ms/step - loss: 0.5663 - accuracy: 0.8339
Prediction accuracy =  0.8348
1875/1875 - 2s - loss: 0.5643 - accuracy: 0.8348 - 2s/epoch - 1ms/step
[0.5643048882484436, 0.8348000049591064]

为什么 fit() 期间的训练集准确度与对相同数据使用预测后立即计算的准确度不同？

Why is training-set accuracy during fit() different to accuracy calculated right after using predict on same data?

python

machine-learning

deep-learning

keras

tensorflow