是否可以在 caffe 中使用任意图像大小?
Is it possible to use arbitrary image sizes in caffe?
我知道 caffe 有所谓的空间金字塔层,它使网络能够使用任意大小的图像。我遇到的问题是,网络似乎拒绝在单个批次中使用任意图像大小。我错过了什么还是这是真正的问题?
我的train_val.prototxt:
name: "digits"
layer {
name: "input"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_batchnormalizedV2AndSPP/1/caffe/train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "input"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_batchnormalizedV2AndSPP/1/caffe/test_lmdb"
batch_size: 10
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "bn1"
type: "BatchNorm"
bottom: "pool1"
top: "bn1"
batch_norm_param {
use_global_stats: false
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
include {
phase: TRAIN
}
}
layer {
name: "bn1"
type: "BatchNorm"
bottom: "pool1"
top: "bn1"
batch_norm_param {
use_global_stats: true
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
include {
phase: TEST
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "bn1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "spatial_pyramid_pooling"
type: "SPP"
bottom: "conv2"
top: "pool2"
spp_param {
pyramid_height: 2
}
}
layer {
name: "bn2"
type: "BatchNorm"
bottom: "pool2"
top: "bn2"
batch_norm_param {
use_global_stats: false
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
include {
phase: TRAIN
}
}
layer {
name: "bn2"
type: "BatchNorm"
bottom: "pool2"
top: "bn2"
batch_norm_param {
use_global_stats: true
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
include {
phase: TEST
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "bn2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
关于后续问题的另一个问题。
您在这里混合了几个概念。
net 可以接受任意输入形状吗?
好吧,并非所有网络都可以使用任何输入形状。在许多情况下,网络仅限于训练它的输入形状。
在大多数情况下,当使用全连接层 ("InnerProduct"
) 时,这些层需要一个 精确的 输入维度,从而改变输入形状 "breaks" 这些层并限制网络到特定、预定义的输入形状。
另一方面 "fully convolutional nets" 在输入形状方面更灵活,通常可以处理任何输入形状。
批量训练时可以改变输入形状吗?
即使您的网络架构允许任意输入形状,您也不能在 batch 训练期间使用您想要的任何形状,因为单个批次中所有样本的输入形状必须相同:你怎么能将一个 27x27 的图像与另一个形状为 17x17 的图像连接起来?
您收到的错误似乎来自 "Data"
层,该层正在努力将不同形状的样本连接成一个批次。
您可以通过设置 batch_size: 1
一次处理一个样本并在 solver.prototxt
中设置 来对 32 个样本的梯度进行平均,从而获得 batch_size: 32
.
我知道 caffe 有所谓的空间金字塔层,它使网络能够使用任意大小的图像。我遇到的问题是,网络似乎拒绝在单个批次中使用任意图像大小。我错过了什么还是这是真正的问题?
我的train_val.prototxt:
name: "digits"
layer {
name: "input"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_batchnormalizedV2AndSPP/1/caffe/train_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "input"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
scale: 0.00390625
}
data_param {
source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_batchnormalizedV2AndSPP/1/caffe/test_lmdb"
batch_size: 10
backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "bn1"
type: "BatchNorm"
bottom: "pool1"
top: "bn1"
batch_norm_param {
use_global_stats: false
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
include {
phase: TRAIN
}
}
layer {
name: "bn1"
type: "BatchNorm"
bottom: "pool1"
top: "bn1"
batch_norm_param {
use_global_stats: true
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
include {
phase: TEST
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "bn1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "spatial_pyramid_pooling"
type: "SPP"
bottom: "conv2"
top: "pool2"
spp_param {
pyramid_height: 2
}
}
layer {
name: "bn2"
type: "BatchNorm"
bottom: "pool2"
top: "bn2"
batch_norm_param {
use_global_stats: false
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
include {
phase: TRAIN
}
}
layer {
name: "bn2"
type: "BatchNorm"
bottom: "pool2"
top: "bn2"
batch_norm_param {
use_global_stats: true
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
include {
phase: TEST
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "bn2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
您在这里混合了几个概念。
net 可以接受任意输入形状吗?
好吧,并非所有网络都可以使用任何输入形状。在许多情况下,网络仅限于训练它的输入形状。
在大多数情况下,当使用全连接层 ("InnerProduct"
) 时,这些层需要一个 精确的 输入维度,从而改变输入形状 "breaks" 这些层并限制网络到特定、预定义的输入形状。
另一方面 "fully convolutional nets" 在输入形状方面更灵活,通常可以处理任何输入形状。
批量训练时可以改变输入形状吗?
即使您的网络架构允许任意输入形状,您也不能在 batch 训练期间使用您想要的任何形状,因为单个批次中所有样本的输入形状必须相同:你怎么能将一个 27x27 的图像与另一个形状为 17x17 的图像连接起来?
您收到的错误似乎来自 "Data"
层,该层正在努力将不同形状的样本连接成一个批次。
您可以通过设置 batch_size: 1
一次处理一个样本并在 solver.prototxt
中设置 batch_size: 32
.