为什么Deep Learning with TensorFlow提供的代码结果和书上的截图不一样

Why is the result of the code offered by Deep Learning with TensorFlow different from the snapshot in its book

Deep Learning with TensorFlow的第一章中,它给出了一个示例,说明如何构建一个简单的神经网络来识别手写数字。根据它的描述,可以在 GitHub.

找到该书的代码包

从上下文来看,我认为 运行 部分是一个简单的 TensorFlow 2.0 网络并建立基线 使用与 Deep-Learning-with-TensorFlow-2-and-Keras/mnist_V1.py 相同的代码。当我 运行 这个示例代码时,它给我以下输出:

书中的快照是:

我截图的结果是375/375,而书的截图是48000/48000。另外,我错过了 Train on 48000 samples, validate on 12000 samples 行。为什么会这样?我怎样才能得到与书中快照相同的结果?

根据我的输出,我认为我加载的数据集的大小与它在代码中描述的大小相同:

# loading MNIST dataset
# verify
# the split between train and test is 60,000, and 10,000 respectly 
# one-hot is automatically applied
mnist = keras.datasets.mnist
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

我的包版本:

$ python --version
Python 3.6.8
$ python3 -c 'import tensorflow as tf; print(tf.__version__)'
2.3.1
$ python3 -c 'import tensorflow as tf; print(tf.keras.__version__)'
2.4.0

我试图从源代码中找到答案。 fit 方法定义在 training.py. In this method, it instantiates a CallbackList object which then creates ProgbarLogger.

# training.py class Model method fit

      # Container that configures and calls `tf.keras.Callback`s.
      if not isinstance(callbacks, callbacks_module.CallbackList):
        callbacks = callbacks_module.CallbackList(
            callbacks,
            add_history=True,
            add_progbar=verbose != 0,
            model=self,
            verbose=verbose,
            epochs=epochs,
            steps=data_handler.inferred_steps)
# callbacks.py class ProgbarLogger

  def on_epoch_begin(self, epoch, logs=None):
    self._reset_progbar()
    if self.verbose and self.epochs > 1:
      print('Epoch %d/%d' % (epoch + 1, self.epochs))

  def on_train_batch_end(self, batch, logs=None):
    self._batch_update_progbar(batch, logs)

  def _batch_update_progbar(self, batch, logs=None):
    # ...

    if self.verbose == 1:
      # Only block async when verbose = 1.
      logs = tf_utils.to_numpy_or_python_type(logs)
      self.progbar.update(self.seen, list(logs.items()), finalize=False)

ProgbarLogger然后调用ProgBar更新方法来更新进度条。

# generic_utils.py class ProgBar method update

    if self.verbose == 1:
      # ...

      if self.target is not None:
        numdigits = int(np.log10(self.target)) + 1
        bar = ('%' + str(numdigits) + 'd/%d [') % (current, self.target)

375self.target的值。然后我发现 self.target 的值是从 CallbackList 对象的 steps 参数传递的。在第一个代码片段中,您可以看到 steps=data_handler.inferred_steps。 属性 inferred_steps 定义在 data_adapter.py.

  @property
  def inferred_steps(self):
    """The inferred steps per epoch of the created `Dataset`.
    This will be `None` in the case where:
    (1) A `Dataset` of unknown cardinality was passed to the `DataHandler`, and
    (2) `steps_per_epoch` was not provided, and
    (3) The first epoch of iteration has not yet completed.
    Returns:
      The inferred steps per epoch of the created `Dataset`.
    """
    return self._inferred_steps

我搞不懂 self._inferred_steps 是如何计算的。

我认为未接线路与training_arrays_v1.py有关。但是不知道V1是什么意思

def _print_train_info(num_samples_or_steps, val_samples_or_steps, is_dataset):
  increment = 'steps' if is_dataset else 'samples'
  msg = 'Train on {0} {increment}'.format(
      num_samples_or_steps, increment=increment)
  if val_samples_or_steps:
    msg += ', validate on {0} {increment}'.format(
        val_samples_or_steps, increment=increment)
  print(msg)

好问题。

让我们把它分解成更小的部分。

您训练了 48.000 个样本并测试了 12.000 个样本。但是,您的代码显示 375 而不是 48.000。

如果您查看批量大小,它的值为 128。

快速除法---> 48.000 // 128 = 375

你的代码是正确的,很好。

问题来自于这样一个事实,即在旧版本的 Keras 和 TensorFlow 中,无论使用 batch_size,都会显示每个步骤的全部样本 (48.000)。在此示例中,进度条更新为:0, 128, 256 .... until 48.000.

现在,在更新的版本中,steps_per_epochvalidation_steps 参数等于样本数(比如 48.000)除以 batch_size 维度(说 128),因此 375.

两种显示都是正确的,只是进度条不同的问题,我个人同意并更喜欢后者,因为,如果你有128的batch_size,我宁愿同意用看到 1, 2, 3 ... 375.

的逻辑

更新进一步说明:

这里有model.fit()个参数的详细说明。

https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

steps_per_epoch Integer or None. Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch.

validation_steps Only relevant if validation_data is provided and is a tf.data dataset. Total number of steps (batches of samples) to draw before stopping when performing validation at the end of every epoch.