使用 Sequential 或 functional 风格构建的相同 Keras 模型的结果截然不同
Very different results from same Keras model, built with Sequential or functional style
我正在尝试实现一个学习设置一些参数的 Keras 回归模型,例如输入中有一些参数和一组不相关的输出,与输入一致(例如相似的输入在训练集中给出相似的输出,并且某些输入和某些输出之间存在部分线性)。
输入和输出被归一化,因为参数有不同的单位。
训练阶段的 mse 约为 0.48,预测相当合理。
这是型号:
model = Sequential()
model.add(Dense(78, activation='relu', input_shape = 3))
model.add(Dense(54, activation='relu'))
model.add(Dense(54, activation='relu'))
model.add(Dense(5))
总结:
X: (2011, 3) y: (2011, 5)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 78) 312
_________________________________________________________________
dense_1 (Dense) (None, 54) 4266
_________________________________________________________________
dense_2 (Dense) (None, 54) 2970
_________________________________________________________________
dense_3 (Dense) (None, 5) 275
=================================================================
Total params: 7,823
Trainable params: 7,823
Non-trainable params: 0
然后我制作完全相同的模型功能风格
inputs = keras.layers.Input(shape=3) #(X.shape[1],)
out = keras.layers.Dense(78, activation='relu')(inputs)
out = keras.layers.Dense(54, activation='relu')(out)
out = keras.layers.Dense(54, activation='relu')(out)
out = keras.layers.Dense(5, activation='relu')(out)
X: (2011, 3) y: (2011, 5)
Model: "func_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 3)] 0
_________________________________________________________________
dense (Dense) (None, 78) 312
_________________________________________________________________
dense_1 (Dense) (None, 54) 4266
_________________________________________________________________
dense_2 (Dense) (None, 54) 2970
_________________________________________________________________
dense_3 (Dense) (None, 5) 275
=================================================================
Total params: 7,823
Trainable params: 7,823
Non-trainable params: 0
总结是完全一样的,除了函数添加了输入层..但是文档说:
When a popular kwarg input_shape is passed, then keras will create an input layer
to insert before the current layer. This can be treated equivalent to explicitly
defining an InputLayer.
https://keras.io/api/layers/core_layers/dense/
这就是我在第一个模型中所做的。所以这两个模型应该是一样的。但事实并非如此:训练期间的 mse 明显更高,约为 0.7,并且与其他模型相反,预测是“扁平化”的:输出集对输入参数的响应最小。
有考虑吗?
区别在于你的输出层激活。在功能中你使用 relu:
out = keras.layers.Dense(5, activation='relu')(out)
按顺序,您使用线性(默认激活)
model.add(Dense(5))
正确的输出激活取决于您建模的数据,但不同之处在于给您带来混乱的结果。
编辑:在查看您的问题后,您的功能模型似乎应该将最后一行更改为
out = keras.layers.Dense(5, activation='linear')(out)
或者干脆
out = keras.layers.Dense(5)(out)
我正在尝试实现一个学习设置一些参数的 Keras 回归模型,例如输入中有一些参数和一组不相关的输出,与输入一致(例如相似的输入在训练集中给出相似的输出,并且某些输入和某些输出之间存在部分线性)。 输入和输出被归一化,因为参数有不同的单位。
训练阶段的 mse 约为 0.48,预测相当合理。
这是型号:
model = Sequential()
model.add(Dense(78, activation='relu', input_shape = 3))
model.add(Dense(54, activation='relu'))
model.add(Dense(54, activation='relu'))
model.add(Dense(5))
总结:
X: (2011, 3) y: (2011, 5)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 78) 312
_________________________________________________________________
dense_1 (Dense) (None, 54) 4266
_________________________________________________________________
dense_2 (Dense) (None, 54) 2970
_________________________________________________________________
dense_3 (Dense) (None, 5) 275
=================================================================
Total params: 7,823
Trainable params: 7,823
Non-trainable params: 0
然后我制作完全相同的模型功能风格
inputs = keras.layers.Input(shape=3) #(X.shape[1],)
out = keras.layers.Dense(78, activation='relu')(inputs)
out = keras.layers.Dense(54, activation='relu')(out)
out = keras.layers.Dense(54, activation='relu')(out)
out = keras.layers.Dense(5, activation='relu')(out)
X: (2011, 3) y: (2011, 5)
Model: "func_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 3)] 0
_________________________________________________________________
dense (Dense) (None, 78) 312
_________________________________________________________________
dense_1 (Dense) (None, 54) 4266
_________________________________________________________________
dense_2 (Dense) (None, 54) 2970
_________________________________________________________________
dense_3 (Dense) (None, 5) 275
=================================================================
Total params: 7,823
Trainable params: 7,823
Non-trainable params: 0
总结是完全一样的,除了函数添加了输入层..但是文档说:
When a popular kwarg input_shape is passed, then keras will create an input layer
to insert before the current layer. This can be treated equivalent to explicitly
defining an InputLayer.
https://keras.io/api/layers/core_layers/dense/
这就是我在第一个模型中所做的。所以这两个模型应该是一样的。但事实并非如此:训练期间的 mse 明显更高,约为 0.7,并且与其他模型相反,预测是“扁平化”的:输出集对输入参数的响应最小。
有考虑吗?
区别在于你的输出层激活。在功能中你使用 relu:
out = keras.layers.Dense(5, activation='relu')(out)
按顺序,您使用线性(默认激活)
model.add(Dense(5))
正确的输出激活取决于您建模的数据,但不同之处在于给您带来混乱的结果。
编辑:在查看您的问题后,您的功能模型似乎应该将最后一行更改为
out = keras.layers.Dense(5, activation='linear')(out)
或者干脆
out = keras.layers.Dense(5)(out)