`tf.nn.ctc_beam_search_decoder()` 不支持 TensorFlow2 中的 GPU 吗？

Question

现在，我尝试在 GPU 上使用 tf.nn.ctc_beam_search_decoder()。
但我有一个问题，它不使用 GPU。

我能够在 GPU 上检查其他张量流函数（例如 Reshape 和 SigmoidGrad 等）运行。
但有些包括 ctc_beam_search_decoder() 仅在 CPU 上运行，并且 ctc_beam_search_decoder() 很慢。

那么，我有两个问题。
首先，TensorFlow2不ctc_beam_search_decoder()支持GPU吗？
二、如果支持的话，能否介绍一下具体实现方式或功能（或方法）？

我在下面展示简单的例子。

程序代码。

import tensorflow as tf
from tensorflow.python.client import device_lib

tf.debugging.set_log_device_placement(True)
print(device_lib.list_local_devices())

inputs = tf.convert_to_tensor([
    [0.1, 0.2, 0.3, 0.4, 0.5],
    [0.2, 0.0, 0.3, 0.1, 0.1],
    [0.2, 0.21, 0.3, 0.4, 0.1],
    [0.2, 0.0, 0.6, 0.1, 0.5],
    [0.2, 1.2, 0.3, 2.1, 0.1]])

inputs = tf.expand_dims(inputs, axis=1)
inputs_len = tf.convert_to_tensor([5])

decoded, _ = tf.nn.ctc_beam_search_decoder(inputs, inputs_len)

结果（标准输出）。

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 714951449022474384
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 11733532016050292601
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 394441871956590417
physical_device_desc: "device: XLA_GPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 11150726272
locality {
  bus_id: 1
  links {
  }
}
incarnation: 5917663253173554940
physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7"
]
Executing op ExpandDims in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op CTCBeamSearchDecoder in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op StridedSlice in device /job:localhost/replica:0/task:0/device:GPU:0

忽略输入和输出数据并关注正在使用的设备。
在这种情况下，ExpandDims 和 StridedSlice 在 GPU 上执行。但是 CTCBeamSearchDecoder 没有在 GPU 上执行。

Answer 1

波束搜索解码器是用纯 C++ 实现的，所以它运行在 CPU 而不是 GPU 上（代码见这里 [1]，与 TF1 基本相同）。

束搜索是一种迭代算法（从一个时间步到下一个时间步），所以我认为运行它在 GPU 上不会带来很大的性能改进。提高运行时间的最简单方法是调整波束宽度（越小越快，越大越准确）。

[1] https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/util/ctc/ctc_beam_search.h#L159

`tf.nn.ctc_beam_search_decoder()` 不支持 TensorFlow2 中的 GPU 吗？

Does NOT `tf.nn.ctc_beam_search_decoder()` support GPU in TensorFlow2?

beam-search

tensorflow

ctc

tensorflow2.0