VGG16模型的输出特征图维度

Question

我在keras doc中看到了特征提取的例子，并使用下面的代码从输入图像中提取特征

input_shape = (224, 224, 3)
model = VGG16(weights = 'imagenet', input_shape = (input_shape[0], 
input_shape[1], input_shape[2]), pooling = 'max', include_top = False)
img = image.load_img(img_path, target_size=(input_shape[0], 
input_shape[1]))
img = image.img_to_array(img)
img = np.expand_dims(img, axis=0)
img = preprocess_input(img)
feature =  model.predict(img)

然后我输出feature变量的形状时，发现是(1, 512)。为什么是这个维度？ print model.summary() 显示最大池化后最后一个 conv 层输出的形状为 (7, 7, 512)，这是我期望 feature 应该是的维度。

Answer 1

感谢 Yong Yuan 帮助我解决这个问题。由于他在 SO 上回答问题时遇到一些问题，所以我只是把他的回答放在这里以防其他人有同样的问题。

基本上是因为在这个模型中指定了一个全局最大池化层（正如我们在model = VGG16(....., pooling = 'max', ....)行中看到的那样，它从7*7的单元格中选择最大的单元格。在keras中也有说documents:

 pooling: Optional pooling mode for feature extraction when include_top is False.

而在model.summary()给出的输出中，我们可以看到在第五个卷积块的最大池化之后实际上有一个global_max_pooling2d_1层，因此最终维度变为512。

VGG16模型的输出特征图维度

output feature map dimension of VGG16 model

feature-extraction

computer-vision

conv-neural-network