运行 tf.Strategy 与 tf.data batch() 时的批量大小

Question

我想在运行执行 tf.distribute 策略时显示批量大小。我通过创建一个自定义 Keras 层来做到这一点：

class DebugLayer(tf.keras.layers.Layer):
    def __init__(self):
        super().__init__()

    def build(self, input_shape):
        pass

    def call(self, inputs):
        print_op = tf.print("******Shape is:", tf.shape(inputs) , name='shapey')
        #print_op = tf.print("Debug output:", loss, y_true, y_true.shape)
        with tf.control_dependencies([print_op]):
            return tf.identity(inputs)

Q1：每个工人每批次的样本数

如果我运行与一名工人一起工作，它会给出 128 的批量大小，这是我在 tf.data 数据集流 .batch(128) 中设置的。

如果我运行有两个工人，每个工人输出128。我想知道每个工人上有多少个例子运行？运行同时有多少个例子？

Q2：正确steps_per_epoch

在我的 Model.fit() 调用中，我指定 steps_per_epoch 并在我的数据流中有一个 .repeat。如果我的训练集包含 1024 个样本，我有 2 个工人，我的 .batch 设置为 128，那么 steps_per_epoch 应该设置一个 epoch 多少？

Answer 1

当使用 tf.data 操作时，有一种通常应用于数据的 .batch() 方法。假设该值为 128。这将是每批运行的总示例数，与工人数量无关。如果...

使用了 1 个 worker，每个训练步骤将运行 128 个示例。
使用了 2 个工人，每个工人运行每个训练步骤 64 个例子。
使用了 3 个工人，每个工人将运行每个训练步骤大约 42 个例子。

对于 3 名工人的案例，我不确定 确切的 数字，因为 128/3 不是整数值。

对于设置 steps_per_epoch，将样本总数除以您在 .batch() 中设置的批量大小。因此，对于我在问题中的示例，它将是 8，即 1024/128。

这有点不方便，因为您需要知道训练示例的数量，如果它们发生变化，您需要调整 steps_per_epoch 值。此外，如果不是整数倍，您需要决定是否应该对 steps_per_epoch 值进行舍入、下限或上限。

运行 tf.Strategy 与 tf.data batch() 时的批量大小

Batch size when running tf.Strategy vs. tf.data batch()

training-data

tensorflow

tf.keras

Q1：每个工人每批次的样本数

Q2：正确steps_per_epoch