使用 Sequential 或 functional 风格构建的相同 Keras 模型的结果截然不同

Very different results from same Keras model, built with Sequential or functional style

我正在尝试实现一个学习设置一些参数的 Keras 回归模型,例如输入中有一些参数和一组不相关的输出,与输入一致(例如相似的输入在训练集中给出相似的输出,并且某些输入和某些输出之间存在部分线性)。 输入和输出被归一化,因为参数有不同的单位。

训练阶段的 mse 约为 0.48,预测相当合理。

这是型号:

model = Sequential()
model.add(Dense(78, activation='relu', input_shape = 3))
model.add(Dense(54, activation='relu'))
model.add(Dense(54, activation='relu'))
model.add(Dense(5))

总结:

X:  (2011, 3) y:  (2011, 5)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 78)                312       
_________________________________________________________________
dense_1 (Dense)              (None, 54)                4266      
_________________________________________________________________
dense_2 (Dense)              (None, 54)                2970      
_________________________________________________________________
dense_3 (Dense)              (None, 5)                 275       
=================================================================
Total params: 7,823
Trainable params: 7,823
Non-trainable params: 0

然后我制作完全相同的模型功能风格

inputs = keras.layers.Input(shape=3) #(X.shape[1],)
out = keras.layers.Dense(78, activation='relu')(inputs)
out = keras.layers.Dense(54, activation='relu')(out)
out = keras.layers.Dense(54, activation='relu')(out)
out = keras.layers.Dense(5, activation='relu')(out)


X:  (2011, 3) y:  (2011, 5)
Model: "func_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 3)]               0         
_________________________________________________________________
dense (Dense)                (None, 78)                312       
_________________________________________________________________
dense_1 (Dense)              (None, 54)                4266      
_________________________________________________________________
dense_2 (Dense)              (None, 54)                2970      
_________________________________________________________________
dense_3 (Dense)              (None, 5)                 275       
=================================================================
Total params: 7,823
Trainable params: 7,823
Non-trainable params: 0

总结是完全一样的,除了函数添加了输入层..但是文档说:

When a popular kwarg input_shape is passed, then keras will create an input layer 
to insert before the current layer. This can be treated equivalent to explicitly
defining an InputLayer.

https://keras.io/api/layers/core_layers/dense/

这就是我在第一个模型中所做的。所以这两个模型应该是一样的。但事实并非如此:训练期间的 mse 明显更高,约为 0.7,并且与其他模型相反,预测是“扁平化”的:输出集对输入参数的响应最小。

有考虑吗?

区别在于你的输出层激活。在功能中你使用 relu:

out = keras.layers.Dense(5, activation='relu')(out)

按顺序,您使用线性(默认激活)

model.add(Dense(5))

正确的输出激活取决于您建模的数据,但不同之处在于给您带来混乱的结果。

编辑:在查看您的问题后,您的功能模型似乎应该将最后一行更改为

out = keras.layers.Dense(5, activation='linear')(out)

或者干脆

out = keras.layers.Dense(5)(out)