如何在 crnn 模型（pytorch）中添加 ctc 波束搜索解码器

Question

我正在关注 https://github.com/meijieru/crnn.pytorch 的 CRNN 实现，但它似乎没有使用束搜索来解码单词。有人能告诉我如何在同一模型中添加波束搜索解码吗？同时在Tensorflow中，有一个内置的tf.nn.ctc_beam_search_decoder。

Answer 1

我知道这不是个好主意，但我是在 pytorch 中使用 tensorflow 做到的。

if(beam):
        decodes, _ = tf.nn.ctc_beam_search_decoder(inputs=preds_.cpu().detach().numpy(), 
                     sequence_length=25*np.ones(1), merge_repeated=False)
        with tf.Session(config = tf.ConfigProto(device_count = {'GPU': 0})) as sess:
            t_ = sess.run(decodes)[0].values
            char_list = []
            for i in range(len(sess.run(decodes)[0].values)):
                    if t_[i] != 0 and (not (i > 0 and t_[i - 1] == t_[i])):
                        char_list.append(alphabet[t_[i] - 1])
            sim_pred = ''.join(char_list)
else:        
        raw_pred = converter.decode(preds.data, preds_size.data, raw=True)
        sim_pred = converter.decode(preds.data, preds_size.data, raw=False)

Answer 2

为什么不直接将您自己的波束搜索解码器添加到模型中呢？应该不会太难。

搜索 CRNN 代码找到 line where decoding happens at the moment:

sim_preds = converter.decode(preds.data, preds_size.data, raw=False)

好吧，好像 preds.data 保存了神经网络的输出张量。不要调用 converter.decode(...)，而是将此张量传递给波束搜索解码器。你可以拿走我的CTC beam search implementation。

调用 BeamSearch.ctcBeamSearch(...)，传递一个已经应用了 softmax 的单个批处理元素 (mat)，传递一个包含所有字符的字符串（按照神经网络输出它们的顺序），然后传递 None 用于语言模型（如果您愿意，可以稍后添加）。矩阵 mat 的形状必须为 Tx(C+1)，其中 T 是时间步数，C+1 是字符数加上空格。空白被假定为最后一个条目，所以请注意它。

这是一个简单的例子：

mat = np.array([[0.4, 0, 0.6], [0.4, 0, 0.6]]) # TxC with T=2, C=3
classes = 'ab' # all chars in the order they appear in mat (without blank)
res = BeamSearch.ctcBeamSearch(mat, classes, None) # decode it

Here is another example 更真实的用例来解码真实文本识别系统的输出。

如何在 crnn 模型（pytorch）中添加 ctc 波束搜索解码器

How do i add ctc beam search decoder in crnn model (pytorch)

ocr

speech-recognition

deep-learning

tensorflow

pytorch