为什么Deep Learning with TensorFlow提供的代码结果和书上的截图不一样

Question

在Deep Learning with TensorFlow的第一章中，它给出了一个示例，说明如何构建一个简单的神经网络来识别手写数字。根据它的描述，可以在 GitHub.

找到该书的代码包

从上下文来看，我认为 运行部分是一个简单的 TensorFlow 2.0 网络并建立基线 使用与 Deep-Learning-with-TensorFlow-2-and-Keras/mnist_V1.py 相同的代码。当我运行这个示例代码时，它给我以下输出：

书中的快照是：

我截图的结果是375/375，而书的截图是48000/48000。另外，我错过了 Train on 48000 samples, validate on 12000 samples 行。为什么会这样？我怎样才能得到与书中快照相同的结果？

根据我的输出，我认为我加载的数据集的大小与它在代码中描述的大小相同：

# loading MNIST dataset
# verify
# the split between train and test is 60,000, and 10,000 respectly 
# one-hot is automatically applied
mnist = keras.datasets.mnist
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

我的包版本：

$ python --version
Python 3.6.8
$ python3 -c 'import tensorflow as tf; print(tf.__version__)'
2.3.1
$ python3 -c 'import tensorflow as tf; print(tf.keras.__version__)'
2.4.0

我试图从源代码中找到答案。 fit 方法定义在 training.py. In this method, it instantiates a CallbackList object which then creates ProgbarLogger.

# training.py class Model method fit

      # Container that configures and calls `tf.keras.Callback`s.
      if not isinstance(callbacks, callbacks_module.CallbackList):
        callbacks = callbacks_module.CallbackList(
            callbacks,
            add_history=True,
            add_progbar=verbose != 0,
            model=self,
            verbose=verbose,
            epochs=epochs,
            steps=data_handler.inferred_steps)

# callbacks.py class ProgbarLogger

  def on_epoch_begin(self, epoch, logs=None):
    self._reset_progbar()
    if self.verbose and self.epochs > 1:
      print('Epoch %d/%d' % (epoch + 1, self.epochs))

  def on_train_batch_end(self, batch, logs=None):
    self._batch_update_progbar(batch, logs)

  def _batch_update_progbar(self, batch, logs=None):
    # ...

    if self.verbose == 1:
      # Only block async when verbose = 1.
      logs = tf_utils.to_numpy_or_python_type(logs)
      self.progbar.update(self.seen, list(logs.items()), finalize=False)

ProgbarLogger然后调用ProgBar更新方法来更新进度条。

# generic_utils.py class ProgBar method update

    if self.verbose == 1:
      # ...

      if self.target is not None:
        numdigits = int(np.log10(self.target)) + 1
        bar = ('%' + str(numdigits) + 'd/%d [') % (current, self.target)

375是self.target的值。然后我发现 self.target 的值是从 CallbackList 对象的 steps 参数传递的。在第一个代码片段中，您可以看到 steps=data_handler.inferred_steps。属性 inferred_steps 定义在 data_adapter.py.

  @property
  def inferred_steps(self):
    """The inferred steps per epoch of the created `Dataset`.
    This will be `None` in the case where:
    (1) A `Dataset` of unknown cardinality was passed to the `DataHandler`, and
    (2) `steps_per_epoch` was not provided, and
    (3) The first epoch of iteration has not yet completed.
    Returns:
      The inferred steps per epoch of the created `Dataset`.
    """
    return self._inferred_steps

我搞不懂 self._inferred_steps 是如何计算的。

我认为未接线路与training_arrays_v1.py有关。但是不知道V1是什么意思

def _print_train_info(num_samples_or_steps, val_samples_or_steps, is_dataset):
  increment = 'steps' if is_dataset else 'samples'
  msg = 'Train on {0} {increment}'.format(
      num_samples_or_steps, increment=increment)
  if val_samples_or_steps:
    msg += ', validate on {0} {increment}'.format(
        val_samples_or_steps, increment=increment)
  print(msg)

Answer 1

好问题。

让我们把它分解成更小的部分。

您训练了 48.000 个样本并测试了 12.000 个样本。但是，您的代码显示 375 而不是 48.000。

如果您查看批量大小，它的值为 128。

快速除法---> 48.000 // 128 = 375

你的代码是正确的，很好。

问题来自于这样一个事实，即在旧版本的 Keras 和 TensorFlow 中，无论使用 batch_size，都会显示每个步骤的全部样本 (48.000)。在此示例中，进度条更新为：0, 128, 256 .... until 48.000.

现在，在更新的版本中，steps_per_epoch 和 validation_steps 参数等于样本数（比如 48.000）除以 batch_size 维度（说 128)，因此 375.

两种显示都是正确的，只是进度条不同的问题，我个人同意并更喜欢后者，因为，如果你有128的batch_size，我宁愿同意用看到 1, 2, 3 ... 375.

的逻辑

更新进一步说明：

这里有model.fit()个参数的详细说明。

https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

steps_per_epoch Integer or None. Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch.

validation_steps Only relevant if validation_data is provided and is a tf.data dataset. Total number of steps (batches of samples) to draw before stopping when performing validation at the end of every epoch.

为什么Deep Learning with TensorFlow提供的代码结果和书上的截图不一样

Why is the result of the code offered by Deep Learning with TensorFlow different from the snapshot in its book

python

mnist

keras

tensorflow

tensorflow2.0