Keras

Question

我已将一系列图像读入形状为 (7338, 225, 1024, 3) 的 numpy 数组，其中 7338 是样本大小，225 是时间步长，1024 (32x32) 是扁平图像像素，在 3 通道 (RGB) 中。

我有一个带有 LSTM 层的顺序模型：

model = Sequential()
model.add(LSTM(128, input_shape=(225, 1024, 3))

但这会导致错误：

Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4

documentation 提到 LSTM 层的输入张量应该是 3D tensor with shape (batch_size, timesteps, input_dim)，但在我的例子中，我的 input_dim 是 2D。

在 Keras 中将 3 通道图像输入到 LSTM 层的建议方法是什么？

Answer 1

如果你希望图像的数量是一个序列（比如有帧的电影），你需要把像素和通道作为特征：

input_shape = (225,3072)  #a 3D input where the batch size 7338 wasn't informed

如果您希望在将 3072 个特征放入 LSTM 之前进行更多处理，您可以组合或交错 2D 卷积和 LSTM 以获得更精细的模型（虽然不一定更好，但每个应用程序都有其特定的行为）。

您也可以尝试使用新的 ConvLSTM2D，它将采用五维输入：

input_shape=(225,32,32,3) #a 5D input where the batch size 7338 wasn't informed

我可能会创建一个包含多个 TimeDistributed(Conv2D(...)) 和 TimeDistributed(MaxPooling2D(...)) 的卷积网络，然后再添加 TimeDistributed(Flatten())，最后添加 LSTM()。这很可能会提高您对图像的理解和 LSTM 的性能。

Answer 2

现在在 keras 指南中有一个如何创建具有嵌套结构的 RNN 的指南，它为每个时间步启用任意输入类型：https://www.tensorflow.org/guide/keras/rnn#rnns_with_listdict_inputs_or_nested_inputs

Keras - 将 3 通道图像输入 LSTM

Keras - Input a 3 channel image into LSTM

python

lstm

recurrent-neural-network