带有 TimeDistributed 层的 CNN-LSTM 在尝试使用 tf.keras.utils.plot_model 时表现异常

CNN-LSTM with TimeDistributed Layers behaving weirdly when trying to use tf.keras.utils.plot_model

我有一个 CNN-LSTM,如下所示;

SEQUENCE_LENGTH = 32
BATCH_SIZE = 32
EPOCHS = 30
n_filters = 64
n_kernel = 1
n_subsequences = 4
n_steps = 8

def DNN_Model(X_train):
    model = Sequential()
    model.add(TimeDistributed(
        Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu', input_shape=(n_subsequences, n_steps, X_train.shape[3]))))
    model.add(TimeDistributed(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu')))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(100, activation='relu'))
    model.add(Dense(100, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='mse', optimizer='adam')
    return model

我将这个 CNN-LSTM 用于多变量时间序列预测问题。 CNN-LSTM 输入数据采用 4D 格式:[样本、子序列、时间步长、特征]。出于某种原因,我需要 TimeDistributed 层;或者我得到像 ValueError: Input 0 of layer conv1d is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 4, 8, 35] 这样的错误。我认为这与 Conv1D 正式不适用于时间序列这一事实有关,因此为了保留时间序列数据形状,我们需要使用像 TimeDistributed 这样的包装层。我真的不介意使用 TimeDistributed 层 - 它们是包装器,如果它们让我的模型工作我很高兴。但是,当我尝试使用

可视化我的模型时
    file = 'CNN_LSTM_Visualization.png'
    tf.keras.utils.plot_model(model, to_file=file, show_layer_names=False, show_shapes=False)

生成的可视化仅显示 Sequential():

我怀疑这与 TimeDistributed 层和尚未构建的模型有关。我也不能调用 model.summary() - 它抛出 ValueError: This model has not yet been built. Build the model first by calling build()or callingfit()with some data, or specify aninput_shape argument in the first layer(s) for automatic build 这很奇怪,因为我 指定了 input_shape,尽管是在 Conv1D 层中而不是在 TimeDistributed 包装器中。

我想要一个工作模型和一个工作 tf.keras.utils.plot_model 函数。关于为什么我需要 TimeDistributed 以及为什么它使 plot_model 函数行为怪异的任何解释都会非常棒。

在开头添加您的输入层。试试这个

def DNN_Model(X_train):
    model = Sequential()
    model.add(InputLayer(input_shape=(n_subsequences, n_steps, X_train)))
    model.add(TimeDistributed(
        Conv1D(filters=n_filters, kernel_size=n_kernel,
               activation='relu')))
    model.add(TimeDistributed(Conv1D(filters=n_filters,
              kernel_size=n_kernel, activation='relu')))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    ....

现在,您可以正确绘制和获取摘要了。

DNN_Model(3).summary() # OK 
tf.keras.utils.plot_model(DNN_Model(3)) # OK

使用 Input 层的替代方法是简单地将 input_shape 传递给 TimeDistributed 包装器,而不是 Conv1D 层:

def DNN_Model(X_train):
    model = Sequential()
    model.add(TimeDistributed(
        Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu'), input_shape=(n_subsequences, n_steps, X_train.shape[3])))
    model.add(TimeDistributed(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu')))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(100, activation='relu'))
    model.add(Dense(100, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='mse', optimizer='adam')
    return model