Tensorflow：如何在 GPU 上从 CPU tf.data.Dataset (from_generator) 预取数据

Question

我正在努力解决以下问题。我正在使用 from_generator 方法创建 tf.data.Dataset。我在 CPU 上执行这些操作，因为我不想让我的 GPU 内存过载。

数据集由元组组成，其中包含固定长度的 tf.bool 一维掩码 (tf.Tensor) 和 tf.float 二维矩阵 (tf.Tensor) 具有可变大小。损失函数是使用以下装饰器装饰的，所以我不认为变量大小是问题所在。

@tf.function(experimental_relax_shapes=True)

理想情况下，数据集保存在 CPU 上，然后预取到 GPU 上。

        def gen():
            for i, j in zip(mask_list, wmat_list):
                yield i, j

        dataset = tf.data.Dataset.from_generator(gen, output_types=(tf.bool, tf.float32))

主训练循环目前依赖tf.identity将数据移动到gpu，效率低下。如下面的 Tensorboard 截图所示。大约 70% 的时间用于加载数据并将其移动到 GPU。

                for b, (mask, wmat) in enumerate(dataset):
                    with tf.GradientTape() as tape:

                        mask = tf.identity(mask)
                        wmat = tf.identity(wmat)

                        mean_error, loss = self.model.loss(mask, wmat)
                        epoch_loss += loss.numpy()
                        epoch_mean_error += mean_error.numpy()

我试过“prefetch_to_device”功能。但是，它并没有将数据移动到 GPU 上。通过打印验证，例如mask.device 在训练循环中。

        gpu_transform = tf.data.experimental.prefetch_to_device('/gpu')
        dataset.apply(gpu_transform)

对我来说，它类似于这个错误：https://github.com/tensorflow/tensorflow/issues/30929。然而，它被标记为已解决并且已经超过一年了。

运行 TF 2.3 使用官方 Docker 图片。

Answer 1

我找到了我自己的问题的解决方案。

问题是数据集中的元组不包含 tf.Tensors，而是 numpy 数组。因此，管道可能受到 py_func().

功能的限制

下面的屏幕截图显示管道在 CPU 上没有阻塞。但是仍然有相当大的 MemCpy。 prefetch_to_device() 仍然没有做任何事情。这可能是由于已知问题导致的，该问题应在 TF2.4

中修复

https://github.com/tensorflow/tensorflow/issues/35563

~~（未经证实的）建议的解决方法对我也不起作用。~~（见编辑）

with tf.device("/gpu:0"):
    ds = ds.prefetch(1)

编辑：

我已进一步调查此问题并提交了错误报告。现在看来，建议的解决方法确实有所作为，但不确定它是否及时完全预取。 https://github.com/tensorflow/tensorflow/issues/43905

Tensorflow：如何在 GPU 上从 CPU tf.data.Dataset (from_generator) 预取数据

Tensorflow: How to prefetch data on the GPU from CPU tf.data.Dataset (from_generator)

tensorflow

tensorflow2.0