在 Tensorflow 中将 CNN 输出传递给 LSTM?
Passing CNN outputs to LSTM in Tensorflow?
假设 CNN 的输出是 [batch_size, height, width, number_of_channels]
的形状(假设格式是 channels_last
),我有这种将 CNN 维度转换为 RNN 维度的方法:
def collapse_to_rnn_dims(inputs):
batch_size, height, width, num_channels = inputs.get_shape().as_list()
if batch_size is None:
batch_size = -1
return tf.reshape(inputs, [batch_size, width, height * num_channels])
确实有效。但是,我只想问一下,这是否真的是重塑 CNN 输出的正确方法,以便它们可以传递到 LSTM 层。
我找到了一个答案 here,虽然这个答案假设 number_of_time_steps
(宽度)是动态的而不是 batch_size
,但它完全符合我对手写文本识别的要求.
shape = cnn_net.get_shape().as_list() # [batch, height, width, features]
transposed = tf.transpose(cnn_net, perm=[0, 2, 1, 3],
name='transposed') # [batch, width, height, features]
conv_reshaped = tf.reshape(transposed, [shape[0], -1, shape[1] * shape[3]],
name='reshaped') # [batch, width, height x features]
假设 CNN 的输出是 [batch_size, height, width, number_of_channels]
的形状(假设格式是 channels_last
),我有这种将 CNN 维度转换为 RNN 维度的方法:
def collapse_to_rnn_dims(inputs):
batch_size, height, width, num_channels = inputs.get_shape().as_list()
if batch_size is None:
batch_size = -1
return tf.reshape(inputs, [batch_size, width, height * num_channels])
确实有效。但是,我只想问一下,这是否真的是重塑 CNN 输出的正确方法,以便它们可以传递到 LSTM 层。
我找到了一个答案 here,虽然这个答案假设 number_of_time_steps
(宽度)是动态的而不是 batch_size
,但它完全符合我对手写文本识别的要求.
shape = cnn_net.get_shape().as_list() # [batch, height, width, features]
transposed = tf.transpose(cnn_net, perm=[0, 2, 1, 3],
name='transposed') # [batch, width, height, features]
conv_reshaped = tf.reshape(transposed, [shape[0], -1, shape[1] * shape[3]],
name='reshaped') # [batch, width, height x features]