没有 CNN 的重复模式的图像语义分割

Question

假设我有一块或多块由单一图案组成的瓷砖（例如，木材、混凝土、砾石等材料...），我想在上面训练我的 classifier，然后我'我将使用经过训练的 classifier 来确定另一个图像中的每个像素属于哪个 class。

下面是我想训练 classifier 的两个图块的示例：

假设我想分割下面的图像以识别属于门的像素和属于墙的像素。这只是一个例子，我知道这张图片的图案与上面的瓷砖不完全相同：

对于这个具体问题，是否有必要使用卷积神经网络？或者有没有一种方法可以通过浅层神经网络或任何其他 classifier 结合纹理特征来实现我的目标？

我已经使用 Scikit-learn 实现了一个 classifier，它可以单独处理图块像素（请参阅下面的代码，其中 training_data 是单例向量），但我想改为训练class纹理图案上的过滤器。

# train classifier
classifier = SGDClassifier()
classifier.fit(training_data, training_target)

# classify given image
test_data = image_gray.flatten().reshape((-1, 1))
predictions = classifier.predict(test_data)
image_classified = predictions.reshape(image_gray.shape)

我正在阅读 this review 最近用于图像分割的深度学习方法，结果似乎很准确，但由于我从未使用过任何 CNN，所以我对它感到害怕。

Answer 1

您可以使用U-Net或SegNet进行图像分割。事实上，您向 CNN 添加残差层以获得此结果：

关于U-Net:

Arxiv：U-Net: Convolutional Networks for Biomedical Image Segmentation

分段网:

Arxiv：SegNet：深度卷积图像的编码器-解码器架构分段

以下是简单的代码示例： keras==1.1.0

U-Net:

shape=60
batch_size = 30
nb_classes = 10
img_rows, img_cols = shape, shape
nb_filters = 32
pool_size = (2, 2)
kernel_size = (3, 3)
input_shape=(shape,shape,1)

reg=0.001
learning_rate = 0.013
decay_rate = 5e-5
momentum = 0.9

sgd = SGD(lr=learning_rate,momentum=momentum, decay=decay_rate, nesterov=True)
shape2

recog0 = Sequential()
recog0.add(Convolution2D(20, 3,3,
                        border_mode='valid',
                        input_shape=input_shape))
recog0.add(BatchNormalization(mode=2))

recog=recog0
recog.add(Activation('relu'))
recog.add(MaxPooling2D(pool_size=(2,2)))
recog.add(UpSampling2D(size=(2, 2)))
recog.add(Convolution2D(20, 3, 3,init='glorot_uniform'))
recog.add(BatchNormalization(mode=2))
recog.add(Activation('relu'))

for i in range(0,2):
    print(i,recog0.layers[i].name)

recog_res=recog0
part=1
recog0.layers[part].name
get_0_layer_output = K.function([recog0.layers[0].input, K.learning_phase()],[recog0.layers[part].output])

get_0_layer_output([x_train, 0])[0][0]

pred=[np.argmax(get_0_layer_output([x_train, 0])[0][i]) for i in range(0,len(x_train))]

loss=x_train-pred
loss=loss.astype('float32')

recog_res.add(Lambda(lambda x: x,input_shape=(56,56,20),output_shape=(56,56,20)))
recog2=Sequential()
recog2.add(Merge([recog,recog_res],mode='ave'))
recog2.add(Activation('relu'))
recog2.add(Convolution2D(20, 3, 3,init='glorot_uniform'))
recog2.add(BatchNormalization(mode=2))
recog2.add(Activation('relu'))
recog2.add(Convolution2D(1, 1, 1,init='glorot_uniform'))
recog2.add(Reshape((shape2,shape2,1)))
recog2.add(Activation('relu'))

recog2.compile(loss='mean_squared_error', optimizer=sgd,metrics = ['mae'])
recog2.summary()

x_train3=x_train2.reshape((1,shape2,shape2,1))

recog2.fit(x_train,x_train3,
                nb_epoch=25,
                batch_size=30,verbose=1)

SegNet:

shape=60
batch_size = 30
nb_classes = 10
img_rows, img_cols = shape, shape
nb_filters = 32
pool_size = (2, 2)
kernel_size = (3, 3)
input_shape=(shape,shape,1)

reg=0.001
learning_rate = 0.012
decay_rate = 5e-5
momentum = 0.9

sgd = SGD(lr=learning_rate,momentum=momentum, decay=decay_rate, nesterov=True)

recog0 = Sequential()
recog0.add(Convolution2D(20, 4,4,
                        border_mode='valid',
                        input_shape=input_shape))
recog0.add(BatchNormalization(mode=2))
recog0.add(MaxPooling2D(pool_size=(2,2)))

recog=recog0
recog.add(Activation('relu'))
recog.add(MaxPooling2D(pool_size=(2,2)))
recog.add(UpSampling2D(size=(2, 2)))
recog.add(Convolution2D(20, 1, 1,init='glorot_uniform'))
recog.add(BatchNormalization(mode=2))
recog.add(Activation('relu'))

for i in range(0,8):
    print(i,recog0.layers[i].name)

recog_res=recog0
part=8
recog0.layers[part].name
get_0_layer_output = K.function([recog0.layers[0].input, K.learning_phase()],[recog0.layers[part].output])
get_0_layer_output([x_train, 0])[0][0]
pred=[np.argmax(get_0_layer_output([x_train, 0])[0][i]) for i in range(0,len(x_train))]

loss=x_train-pred
loss=loss.astype('float32')

recog_res.add(Lambda(lambda x: x-np.mean(loss),input_shape=(28,28,20),output_shape=(28,28,20)))

recog2=Sequential()
recog2.add(Merge([recog,recog_res],mode='sum'))
recog2.add(UpSampling2D(size=(2, 2)))
recog2.add(Convolution2D(1, 3, 3,init='glorot_uniform'))
recog2.add(BatchNormalization(mode=2))
recog2.add(Reshape((shape2*shape2,)))
recog2.add(Reshape((shape2,shape2,1)))
recog2.add(Activation('relu'))
recog2.compile(loss='mean_squared_error', optimizer=sgd,metrics = ['mae'])
recog2.summary()

x_train3=x_train2.reshape((1,shape2,shape2,1))

recog2.fit(x_train,x_train3,
                nb_epoch=400,
                batch_size=30,verbose=1)

然后为分割的颜色添加一个阈值。

Answer 2

卷积神经网络 (CNN) 是用于图像识别（包括语义分割）的高性能工具，已被证明是 very sensitive to texture。然而，计算机视觉领域早在当前对深度学习的兴趣浪潮之前就已经存在，并且还有各种其他工具仍然相关 - 通常对计算资源 and/or 训练数据的要求较小。

For this specific problem, is it necessary to use convolutional neural networks?

这在很大程度上取决于您的成功指标是什么。还有其他工具不涉及 CNN 的使用——它们是否会给你一个令人满意的检测精度水平只能通过实际测试来确定。

Or is there a way to achieve my goal with a shallow neural network or any other classifier, combined with texture features for example?

浅层神经网络将具有一定的检测能力，尽管（与 CNN 不同）它们不会表现出 translational invariance and so are sensitive to small displacements of the target. Such a network is likely to have more success if used to classify small patches of the image; classifying an image patch within a sliding window is not that unlike how a CNN works, of course. It is also possible to approximate a CNN using an equivalent multi-layer perceptron (MLP) - 如果您对 'shallow' 的定义允许，那将是另一种方法。

两种不需要神经网络的方法：

方向梯度直方图 HOG 描述符使用水平和垂直轴上的梯度直方图提取图像特征。这会产生一个可以分类的特征向量——例如使用支持向量机 (SVM) 或浅层神经网络 (MLP)。这将是一种不使用 CNN 对图像块进行分类的可行方法。 scikit-image 包有一个 HOG function, and there is a full worked example of classification of HOG features here。来自文档：

from skimage.feature import hog
from skimage import data, exposure

image = data.astronaut()

fd, hog_image = hog(image, orientations=8, pixels_per_cell=(16, 16),
                    cells_per_block=(1, 1), visualize=True, multichannel=True)

Felsenszwalb 基于图形的高效图像分割 scikit-image.segmentation 工具箱中有一堆分割算法。 Felsenszwalb 就是其中之一，它（从广义上讲）基于边缘对图像区域进行聚类。 More info here。来自模块文档：

from skimage.segmentation import felzenszwalb
from skimage.data import coffee
img = coffee()
segments = felzenszwalb(img, scale=3.0, sigma=0.95, min_size=5)

希望对您有所帮助。

没有 CNN 的重复模式的图像语义分割

Image semantic segmentation of repeating patterns without CNNs

python

textures

machine-learning

image-processing