白化输入数据时模型未训练和负损失
Model not training and negative loss when whitening input data
我正在进行分割并且我的数据集有点小(1840 张图像)所以我想使用数据增强。我正在使用 keras 文档中提供的生成器,它生成一个元组,其中包含一批图像和以相同方式增强的相应掩码。
data_gen_args = dict(featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
zoom_range=0.2,
fill_mode='nearest',
horizontal_flip=True)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1
image_datagen.fit(X_train, augment=True, seed=seed, rounds=2)
mask_datagen.fit(Y_train, augment=True, seed=seed, rounds=2)
image_generator = image_datagen.flow(X_train,
batch_size=BATCH_SIZE,
seed=seed)
mask_generator = mask_datagen.flow(Y_train,
batch_size=BATCH_SIZE,
seed=seed)
# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)
然后我用这个生成器训练我的模型:
model.fit_generator(
generator=train_generator,
steps_per_epoch=m.ceil(len(X_train)/BATCH_SIZE),
validation_data=(X_val, Y_val),
epochs=EPOCHS,
callbacks=callbacks,
workers=4,
use_multiprocessing=True,
verbose=2)
但是通过使用这个我得到负损失并且模型没有训练:
Epoch 2/5000
- 4s - loss: -2.5572e+00 - iou: 0.0138 - acc: 0.0000e+00 - val_loss: 11.8256 - val_iou: 0.0000e+00 - val_acc: 0.1551
我还想补充一点,如果我不使用 featurewise_center 和 featurewise_std_normalization,模型正在训练。
但是我使用的是带有批量归一化的模型,如果输入被归一化,它的表现会更好,所以这就是为什么我真的很想使用特征参数。
我希望我能很好地解释我的问题,你们中的一些人可能会帮助我,因为我真的不明白。
编辑:
我的模型是一个带有自定义 Conv2D 和 Conv2DTranspose 块的 U-Net:
def Conv2D_BN(x, filters, kernel_size, strides=(1,1), padding='same', activation='relu', kernel_initializer='glorot_normal', kernel_regularizer=None):
x = Conv2D(filters, kernel_size=kernel_size, strides=strides, padding=padding, kernel_regularizer=kernel_regularizer)(x)
x = BatchNormalization()(x)
x = Activation(activation)(x)
return x
def Conv2DTranspose_BN(x, filters, kernel_size, strides=(1,1), padding='same', activation='relu', kernel_initializer='glorot_normal', kernel_regularizer=None):
x = Conv2DTranspose(filters, kernel_size=kernel_size, strides=strides, padding=padding, kernel_regularizer=kernel_regularizer)(x)
x = BatchNormalization()(x)
x = Activation(activation)(x)
return x
def build_unet_bn(input_layer = Input((128,128,3)), start_depth=16, activation='relu', initializer='glorot_normal'):
# 128 -> 64
conv1 = Conv2D_BN(input_layer, start_depth * 1, (3, 3), activation=activation, kernel_initializer=initializer)
conv1 = Conv2D_BN(conv1, start_depth * 1, (3, 3), activation=activation, kernel_initializer=initializer)
pool1 = MaxPooling2D((2, 2))(conv1)
# 64 -> 32
conv2 = Conv2D_BN(pool1, start_depth * 2, (3, 3), activation=activation, kernel_initializer=initializer)
conv2 = Conv2D_BN(conv2, start_depth * 2, (3, 3), activation=activation, kernel_initializer=initializer)
pool2 = MaxPooling2D((2, 2))(conv2)
# 32 -> 16
conv3 = Conv2D_BN(pool2, start_depth * 4, (3, 3), activation=activation, kernel_initializer=initializer)
conv3 = Conv2D_BN(conv3, start_depth * 4, (3, 3), activation=activation, kernel_initializer=initializer)
pool3 = MaxPooling2D((2, 2))(conv3)
# 16 -> 8
conv4 = Conv2D_BN(pool3, start_depth * 8, (3, 3), activation=activation, kernel_initializer=initializer)
conv4 = Conv2D_BN(conv4, start_depth * 8, (3, 3), activation=activation, kernel_initializer=initializer)
pool4 = MaxPooling2D((2, 2))(conv4)
# Middle
convm = Conv2D_BN(pool4, start_depth * 16, (3, 3), activation=activation, kernel_initializer=initializer)
convm = Conv2D_BN(convm, start_depth * 16, (3, 3), activation=activation, kernel_initializer=initializer)
# 8 -> 16
deconv4 = Conv2DTranspose_BN(convm, start_depth * 8, (3, 3), strides=(2, 2), activation=activation, kernel_initializer=initializer)
uconv4 = concatenate([deconv4, conv4])
uconv4 = Conv2D_BN(uconv4, start_depth * 8, (3, 3), activation=activation, kernel_initializer=initializer)
uconv4 = Conv2D_BN(uconv4, start_depth * 8, (3, 3), activation=activation, kernel_initializer=initializer)
# 16 -> 32
deconv3 = Conv2DTranspose_BN(uconv4, start_depth * 4, (3, 3), strides=(2, 2), activation=activation, kernel_initializer=initializer)
uconv3 = concatenate([deconv3, conv3])
uconv3 = Conv2D_BN(uconv3, start_depth * 4, (3, 3), activation=activation, kernel_initializer=initializer)
uconv3 = Conv2D_BN(uconv3, start_depth * 4, (3, 3), activation=activation, kernel_initializer=initializer)
# 32 -> 64
deconv2 = Conv2DTranspose_BN(uconv3, start_depth * 2, (3, 3), strides=(2, 2), activation=activation, kernel_initializer=initializer)
uconv2 = concatenate([deconv2, conv2])
uconv2 = Conv2D_BN(uconv2, start_depth * 2, (3, 3), activation=activation, kernel_initializer=initializer)
uconv2 = Conv2D_BN(uconv2, start_depth * 2, (3, 3), activation=activation, kernel_initializer=initializer)
# 64 -> 128
deconv1 = Conv2DTranspose_BN(uconv2, start_depth * 1, (3, 3), strides=(2, 2), activation=activation, kernel_initializer=initializer)
uconv1 = concatenate([deconv1, conv1])
uconv1 = Conv2D_BN(uconv1, start_depth * 1, (3, 3), activation=activation, kernel_initializer=initializer)
uconv1 = Conv2D_BN(uconv1, start_depth * 1, (3, 3), activation=activation, kernel_initializer=initializer)
output_layer = Conv2D(1, (1,1), padding="same", activation="sigmoid")(uconv1)
return output_layer
我创建了我的模型并用 :
编译它
input_layer=Input((size,size,3))
output_layer = build_unet_bn(input_layer, 16)
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer=Adam(lr=1e-3), loss='binary_crossentropy', metrics=metrics)
要了解为什么您的模型没有学习,您应该考虑两件事。
首先,由于你最后一层的激活是 sigmoid,你的模型总是输出范围 (0, 1) 内的值。但是由于 featurewise_center
和 featurewise_std_normalization
,目标值将在 [-1, 1] 范围内。这意味着您的目标变量的域与您的网络输出的域不同。
其次,二元交叉熵损失是基于"target variable is in range [0, 1] and network output is in range (0, 1)"的假设。二元交叉熵的方程为
=-[yln(%5Chat%7By%7D)%2b(1-y)ln(1-%5Chat%7By%7D)])
您得到负值是因为您的目标变量 (y) 在 [-1, 1] 范围内。例如,如果 target(y) 值为 -0.5 并且网络输出 0.01,则您的损失值将为 ~ -2.2875
解决方案
解决方案 1
从数据扩充中删除 featurewise_center
和 featurewise_std_normalization
。
解决方案 2
更改最后一层的激活和损失函数,使其更适合您的问题。例如 tanh
函数输出范围 [-1, 1] 中的值。稍微改变二元交叉熵 tanh 函数即可训练您的模型。
结论
我认为使用解决方案 1 更好,因为它非常简单直接。但是如果你真的想使用 "feature wise center" 和 "feature wise std normalization" 我认为你应该使用解决方案 2.
由于 tanh 函数是 sigmoid 函数的重新缩放版本,对用于 tanh 激活的二元交叉熵进行轻微修改(从 this answer 中找到)
=-%5Cfrac%7B1%7D%7B2%7D[(1-y)ln(1-%5Chat%7By%7D)%2b(1%2by)ln(1%2b%5Chat%7By%7D)])
这可以在keras中实现如下,
def bce_modified(y_true, y_pred):
return (1.0/2.0) * ((1-y_true) * K.log(1-y_pred) + (1+y_true) * K.log(1+y_pred))
def build_unet_bn(input_layer = Input((128,128,3)), start_depth=16, activation='relu', initializer='glorot_normal'):
# part of the method without the last layer
output_layer = Conv2D(1, (1,1), padding="same", activation="tanh")(uconv1)
return output_layer
model.compile(optimizer=Adam(lr=1e-3), loss=bce_modified, metrics=metrics)
我正在进行分割并且我的数据集有点小(1840 张图像)所以我想使用数据增强。我正在使用 keras 文档中提供的生成器,它生成一个元组,其中包含一批图像和以相同方式增强的相应掩码。
data_gen_args = dict(featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
zoom_range=0.2,
fill_mode='nearest',
horizontal_flip=True)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1
image_datagen.fit(X_train, augment=True, seed=seed, rounds=2)
mask_datagen.fit(Y_train, augment=True, seed=seed, rounds=2)
image_generator = image_datagen.flow(X_train,
batch_size=BATCH_SIZE,
seed=seed)
mask_generator = mask_datagen.flow(Y_train,
batch_size=BATCH_SIZE,
seed=seed)
# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)
然后我用这个生成器训练我的模型:
model.fit_generator(
generator=train_generator,
steps_per_epoch=m.ceil(len(X_train)/BATCH_SIZE),
validation_data=(X_val, Y_val),
epochs=EPOCHS,
callbacks=callbacks,
workers=4,
use_multiprocessing=True,
verbose=2)
但是通过使用这个我得到负损失并且模型没有训练:
Epoch 2/5000
- 4s - loss: -2.5572e+00 - iou: 0.0138 - acc: 0.0000e+00 - val_loss: 11.8256 - val_iou: 0.0000e+00 - val_acc: 0.1551
我还想补充一点,如果我不使用 featurewise_center 和 featurewise_std_normalization,模型正在训练。 但是我使用的是带有批量归一化的模型,如果输入被归一化,它的表现会更好,所以这就是为什么我真的很想使用特征参数。
我希望我能很好地解释我的问题,你们中的一些人可能会帮助我,因为我真的不明白。
编辑: 我的模型是一个带有自定义 Conv2D 和 Conv2DTranspose 块的 U-Net:
def Conv2D_BN(x, filters, kernel_size, strides=(1,1), padding='same', activation='relu', kernel_initializer='glorot_normal', kernel_regularizer=None):
x = Conv2D(filters, kernel_size=kernel_size, strides=strides, padding=padding, kernel_regularizer=kernel_regularizer)(x)
x = BatchNormalization()(x)
x = Activation(activation)(x)
return x
def Conv2DTranspose_BN(x, filters, kernel_size, strides=(1,1), padding='same', activation='relu', kernel_initializer='glorot_normal', kernel_regularizer=None):
x = Conv2DTranspose(filters, kernel_size=kernel_size, strides=strides, padding=padding, kernel_regularizer=kernel_regularizer)(x)
x = BatchNormalization()(x)
x = Activation(activation)(x)
return x
def build_unet_bn(input_layer = Input((128,128,3)), start_depth=16, activation='relu', initializer='glorot_normal'):
# 128 -> 64
conv1 = Conv2D_BN(input_layer, start_depth * 1, (3, 3), activation=activation, kernel_initializer=initializer)
conv1 = Conv2D_BN(conv1, start_depth * 1, (3, 3), activation=activation, kernel_initializer=initializer)
pool1 = MaxPooling2D((2, 2))(conv1)
# 64 -> 32
conv2 = Conv2D_BN(pool1, start_depth * 2, (3, 3), activation=activation, kernel_initializer=initializer)
conv2 = Conv2D_BN(conv2, start_depth * 2, (3, 3), activation=activation, kernel_initializer=initializer)
pool2 = MaxPooling2D((2, 2))(conv2)
# 32 -> 16
conv3 = Conv2D_BN(pool2, start_depth * 4, (3, 3), activation=activation, kernel_initializer=initializer)
conv3 = Conv2D_BN(conv3, start_depth * 4, (3, 3), activation=activation, kernel_initializer=initializer)
pool3 = MaxPooling2D((2, 2))(conv3)
# 16 -> 8
conv4 = Conv2D_BN(pool3, start_depth * 8, (3, 3), activation=activation, kernel_initializer=initializer)
conv4 = Conv2D_BN(conv4, start_depth * 8, (3, 3), activation=activation, kernel_initializer=initializer)
pool4 = MaxPooling2D((2, 2))(conv4)
# Middle
convm = Conv2D_BN(pool4, start_depth * 16, (3, 3), activation=activation, kernel_initializer=initializer)
convm = Conv2D_BN(convm, start_depth * 16, (3, 3), activation=activation, kernel_initializer=initializer)
# 8 -> 16
deconv4 = Conv2DTranspose_BN(convm, start_depth * 8, (3, 3), strides=(2, 2), activation=activation, kernel_initializer=initializer)
uconv4 = concatenate([deconv4, conv4])
uconv4 = Conv2D_BN(uconv4, start_depth * 8, (3, 3), activation=activation, kernel_initializer=initializer)
uconv4 = Conv2D_BN(uconv4, start_depth * 8, (3, 3), activation=activation, kernel_initializer=initializer)
# 16 -> 32
deconv3 = Conv2DTranspose_BN(uconv4, start_depth * 4, (3, 3), strides=(2, 2), activation=activation, kernel_initializer=initializer)
uconv3 = concatenate([deconv3, conv3])
uconv3 = Conv2D_BN(uconv3, start_depth * 4, (3, 3), activation=activation, kernel_initializer=initializer)
uconv3 = Conv2D_BN(uconv3, start_depth * 4, (3, 3), activation=activation, kernel_initializer=initializer)
# 32 -> 64
deconv2 = Conv2DTranspose_BN(uconv3, start_depth * 2, (3, 3), strides=(2, 2), activation=activation, kernel_initializer=initializer)
uconv2 = concatenate([deconv2, conv2])
uconv2 = Conv2D_BN(uconv2, start_depth * 2, (3, 3), activation=activation, kernel_initializer=initializer)
uconv2 = Conv2D_BN(uconv2, start_depth * 2, (3, 3), activation=activation, kernel_initializer=initializer)
# 64 -> 128
deconv1 = Conv2DTranspose_BN(uconv2, start_depth * 1, (3, 3), strides=(2, 2), activation=activation, kernel_initializer=initializer)
uconv1 = concatenate([deconv1, conv1])
uconv1 = Conv2D_BN(uconv1, start_depth * 1, (3, 3), activation=activation, kernel_initializer=initializer)
uconv1 = Conv2D_BN(uconv1, start_depth * 1, (3, 3), activation=activation, kernel_initializer=initializer)
output_layer = Conv2D(1, (1,1), padding="same", activation="sigmoid")(uconv1)
return output_layer
我创建了我的模型并用 :
编译它input_layer=Input((size,size,3))
output_layer = build_unet_bn(input_layer, 16)
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer=Adam(lr=1e-3), loss='binary_crossentropy', metrics=metrics)
要了解为什么您的模型没有学习,您应该考虑两件事。
首先,由于你最后一层的激活是 sigmoid,你的模型总是输出范围 (0, 1) 内的值。但是由于 featurewise_center
和 featurewise_std_normalization
,目标值将在 [-1, 1] 范围内。这意味着您的目标变量的域与您的网络输出的域不同。
其次,二元交叉熵损失是基于"target variable is in range [0, 1] and network output is in range (0, 1)"的假设。二元交叉熵的方程为
您得到负值是因为您的目标变量 (y) 在 [-1, 1] 范围内。例如,如果 target(y) 值为 -0.5 并且网络输出 0.01,则您的损失值将为 ~ -2.2875
解决方案
解决方案 1
从数据扩充中删除 featurewise_center
和 featurewise_std_normalization
。
解决方案 2
更改最后一层的激活和损失函数,使其更适合您的问题。例如 tanh
函数输出范围 [-1, 1] 中的值。稍微改变二元交叉熵 tanh 函数即可训练您的模型。
结论
我认为使用解决方案 1 更好,因为它非常简单直接。但是如果你真的想使用 "feature wise center" 和 "feature wise std normalization" 我认为你应该使用解决方案 2.
由于 tanh 函数是 sigmoid 函数的重新缩放版本,对用于 tanh 激活的二元交叉熵进行轻微修改(从 this answer 中找到)
这可以在keras中实现如下,
def bce_modified(y_true, y_pred):
return (1.0/2.0) * ((1-y_true) * K.log(1-y_pred) + (1+y_true) * K.log(1+y_pred))
def build_unet_bn(input_layer = Input((128,128,3)), start_depth=16, activation='relu', initializer='glorot_normal'):
# part of the method without the last layer
output_layer = Conv2D(1, (1,1), padding="same", activation="tanh")(uconv1)
return output_layer
model.compile(optimizer=Adam(lr=1e-3), loss=bce_modified, metrics=metrics)