dataset.repeat()会造成死循环吗？

Question

我是通过看官方文档来学习Tensorflow的。但被这一行弄糊涂了：

dataset = dataset.shuffle(1000).repeat()

我尝试运行整个项目，它确实可以工作。但是我不明白为什么它不会进入dataset.repeat()造成的死循环，因为你没有分配计数所以它会无限期地重复。

希望有人能帮我解决这个问题吗？

这是这行代码的link：https://www.tensorflow.org/tutorials/estimator/premade

部分是“定义特征列”，整个代码块复制如下：

    """An input function for training or evaluating"""
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle and repeat if you are in training mode.
    if training:
        dataset = dataset.shuffle(1000).repeat()

    return dataset.batch(batch_size)```

Answer 1

确实进入无限循环。如果您想知道训练是如何完成的，文档表明在

中提供了 steps 参数

classifier.train(input_fn=lambda: input_fn(train, train_y, training=True), steps=5000)

所以即使循环是无限的，优化器也知道要执行多少步。

至于为什么要这样做的问题...我认为这是个人喜好问题，但是，在某些情况下这可能非常有用。让我们看看生成 tf.data.Dataset 用于训练和测试的函数：

def input_fn(features, labels, training=True, batch_size=256):
    """An input function for training or evaluating"""
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle and repeat if you are in training mode.
    if training:
        dataset = dataset.shuffle(1000).repeat()

    return dataset.batch(batch_size)

现在，假设您想研究批量大小如何影响模型的学习方式。为了比较具有不同批量大小的模型并公平对待，您应该允许每个模型进行相同数量的迭代（权重更新）。不同之处在于，具有更大批量大小的模型将更好地近似整个数据集的梯度。使用上述教程中的设置非常容易。您只需更改 input_fn 中的 batch_size 参数，由于步骤数不变，您的管道已准备就绪。尝试以另一种方式去做可能会很痛苦

dataset.repeat()会造成死循环吗？

dataset.repeat() will cause the infinite loop?

dataset

python-3.x

tensorflow