Keras 自动编码器输入图像大小
Keras Autoencoder Input Image Size
考虑这个自动编码器:
import numpy as np
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D, Flatten, Reshape
from keras.models import Model
class ConvAutoencoder:
def __init__(self, image_size, latent_dim):
inp = Input(shape=(image_size[0], image_size[1], 1))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(inp)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
d = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
d = UpSampling2D((2, 2))(d)
d = Conv2D(8, (3, 3), activation='relu', padding='same')(d)
d = UpSampling2D((2, 2))(d)
d = Conv2D(16, (3, 3), activation='relu')(d)
d = UpSampling2D((2, 2))(d)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(d)
self.model = Model(inp, decoded)
self.encoder = Model(inp, encoded)
self.model.compile(loss='mse', optimizer='Adam')
print(self.model.summary())
我用
实例化它
ConvAutoencoder(image_size=(32,32), latent_dim=10)
打印
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 32, 32, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 32, 32, 16) 160
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 16, 16, 8) 1160
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 8) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 8, 8, 8) 584
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 4, 4, 8) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 4, 4, 8) 584
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 8, 8, 8) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 8, 8, 8) 584
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 16, 16, 8) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 14, 14, 16) 1168
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 28, 28, 16) 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 28, 28, 1) 145
=================================================================
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
_________________________________________________________________
None
如您所见,输入图像大小为 (32,32)
,但输出图像大小为 (28,28)
。
* 问题 1:如何更改自动编码器的架构,使输出图像大小变为 (32,32)
?
* 问题 2:如您所见,class 需要一个名为 latent_dim
的参数。目前,此参数未被使用。有没有一种简单的方法可以 "forcing" 将自动编码器的潜在维度降低到一定数量?例如。在中间添加一个完全连接的层或沿着这些线添加什么?
Question 1
好吧,你忘记了上次上采样中的一个padding='same'
。
应该是这样的
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
d = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
d = UpSampling2D((2, 2))(d)
d = Conv2D(8, (3, 3), activation='relu', padding='same')(d)
d = UpSampling2D((2, 2))(d)
d = Conv2D(16, (3, 3), activation='relu', padding='same')(d)
d = UpSampling2D((2, 2))(d)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(d)
Question 2
你是指内核吗?那么
x = Conv2D(latent_dim*4, (3, 3), activation='relu', padding='same')(inp)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(latent_dim*2, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(latent_dim, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
d = Conv2D(latent_dim, (3, 3), activation='relu', padding='same')(encoded)
d = UpSampling2D((2, 2))(d)
d = Conv2D(latent_dim*2, (3, 3), activation='relu', padding='same')(d)
d = UpSampling2D((2, 2))(d)
d = Conv2D(latent_dim*4, (3, 3), activation='relu', padding='same')(d)
d = UpSampling2D((2, 2))(d)
但是如果你的意思是你希望中间层有一个特定的内核大小,那么你可以用这样的步幅替换 MaxPooling2D
到 Conv2D
。
encoded = Conv2D(latent_dim, (3, 3), activation='relu', padding='same', strides=2)(x)
实际上,您可以删除所有 Maxpooling2D
并将 strides=2
添加到所有 Conv2D
。
考虑这个自动编码器:
import numpy as np
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D, Flatten, Reshape
from keras.models import Model
class ConvAutoencoder:
def __init__(self, image_size, latent_dim):
inp = Input(shape=(image_size[0], image_size[1], 1))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(inp)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
d = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
d = UpSampling2D((2, 2))(d)
d = Conv2D(8, (3, 3), activation='relu', padding='same')(d)
d = UpSampling2D((2, 2))(d)
d = Conv2D(16, (3, 3), activation='relu')(d)
d = UpSampling2D((2, 2))(d)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(d)
self.model = Model(inp, decoded)
self.encoder = Model(inp, encoded)
self.model.compile(loss='mse', optimizer='Adam')
print(self.model.summary())
我用
实例化它ConvAutoencoder(image_size=(32,32), latent_dim=10)
打印
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 32, 32, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 32, 32, 16) 160
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 16, 16, 8) 1160
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 8) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 8, 8, 8) 584
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 4, 4, 8) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 4, 4, 8) 584
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 8, 8, 8) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 8, 8, 8) 584
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 16, 16, 8) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 14, 14, 16) 1168
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 28, 28, 16) 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 28, 28, 1) 145
=================================================================
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
_________________________________________________________________
None
如您所见,输入图像大小为 (32,32)
,但输出图像大小为 (28,28)
。
* 问题 1:如何更改自动编码器的架构,使输出图像大小变为 (32,32)
?
* 问题 2:如您所见,class 需要一个名为 latent_dim
的参数。目前,此参数未被使用。有没有一种简单的方法可以 "forcing" 将自动编码器的潜在维度降低到一定数量?例如。在中间添加一个完全连接的层或沿着这些线添加什么?
Question 1
好吧,你忘记了上次上采样中的一个padding='same'
。
应该是这样的
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
d = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
d = UpSampling2D((2, 2))(d)
d = Conv2D(8, (3, 3), activation='relu', padding='same')(d)
d = UpSampling2D((2, 2))(d)
d = Conv2D(16, (3, 3), activation='relu', padding='same')(d)
d = UpSampling2D((2, 2))(d)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(d)
Question 2
你是指内核吗?那么
x = Conv2D(latent_dim*4, (3, 3), activation='relu', padding='same')(inp)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(latent_dim*2, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(latent_dim, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
d = Conv2D(latent_dim, (3, 3), activation='relu', padding='same')(encoded)
d = UpSampling2D((2, 2))(d)
d = Conv2D(latent_dim*2, (3, 3), activation='relu', padding='same')(d)
d = UpSampling2D((2, 2))(d)
d = Conv2D(latent_dim*4, (3, 3), activation='relu', padding='same')(d)
d = UpSampling2D((2, 2))(d)
但是如果你的意思是你希望中间层有一个特定的内核大小,那么你可以用这样的步幅替换 MaxPooling2D
到 Conv2D
。
encoded = Conv2D(latent_dim, (3, 3), activation='relu', padding='same', strides=2)(x)
实际上,您可以删除所有 Maxpooling2D
并将 strides=2
添加到所有 Conv2D
。