tf.nn_conv2d 和 tf.nn.depthwise_conv2d 之间的区别

Question

Tensorflow中的tf.nn_conv2d和tf.nn.depthwise_conv2d有什么区别？

Answer 1

我不是这方面的专家，但据我了解，区别在于：

假设您有一个长度为 100、宽度为 100 的输入彩色图像。因此尺寸为 100x100x3。对于这两个示例，我们使用相同的宽度和高度为 5 的过滤器。假设我们希望下一层的深度为 8。

在 tf.nn.conv2d 中，您将内核形状定义为 [width, height, in_channels, out_channels]。在我们的例子中，这意味着内核的形状为 [5,5,3,out_channels]。跨过图像的权重核具有 5x5x3 的形状，跨过整个图像 8 次以产生 8 个不同的特征图。

在 tf.nn.depthwise_conv2d 中，您将内核形状定义为 [width, height, in_channels, channel_multiplier]。现在产生的输出不同了。单独的 5x5x1 过滤器跨越输入图像的每个维度，每个维度一个过滤器，每个过滤器每个维度产生一个特征图。所以在这里，内核大小 [5,5,3,1] 将产生深度为 3 的输出。channel_multiplier 告诉您每个维度要应用多少个不同的过滤器 .因此，对于 3 个输入维度，深度 8 的原始期望输出是不可能的。只能是 3 的倍数。

Answer 2

tf.nn.depthwise_conv2d表示每个输入通道使用N个不同的滤波器。输出会有N * channel_multiplier个不同的输出通道。运行代码你就知道了。

import tensorflow as tf
import numpy as np
# input image with 10x10 shape for 3 channels
# filter with 10x10 shape for each input channel

N_in_channel = 3
N_out_channel_mul = 8
x = tf.random_normal([1, 10, 10, N_in_channel])
f = tf.random_normal([10, 10, N_in_channel, N_out_channel_mul])
y = tf.nn.depthwise_conv2d(x, f, strides=[1, 1, 1, 1], padding="VALID", data_format="NHWC")

sess = tf.Session()
sess.run(tf.global_variables_initializer())

x_data, f_data, y_conv = sess.run([x, f, y])

y_s = np.squeeze(y_conv)
for i in range(N_in_channel):
    for j in range(N_out_channel_mul):
        print("np: %f, tf:%f" % (np.sum(x_data[0, :, :, i] * f_data[:, :, i, j]), y_s[i * N_out_channel_mul + j]))

Answer 3

让我们看看TensorFlow API(r1.7)

中的示例代码

对于depthwise_conv2d,

output[b, i, j, k * channel_multiplier + q] =
    sum_{di, dj} input[b, strides[1] * i + rate[0] * di,
                          strides[2] * j + rate[1] * dj, k] *
                 filter[di, dj, k, q]

过滤器是 [filter_height, filter_width, in_channels, channel_multiplier]

对于conv2d,

output[b, i, j, k] =
    sum_{di, dj, q} input[b, strides[1] * i + di,
                             strides[2] * j + dj, q] *
                    filter[di, dj, q, k]

过滤器是 [filter_height, filter_width, in_channels, out_channels]

关注k和q，我们可以看到上面显示的差异。

默认格式是NHWC，其中b是batch size，(i, j)是feature map中的一个坐标。

（注意k和q在这两个函数中指的是不同的东西。）

对于depthwise_conv2d，k指输入通道，q，0 <= q < channel_multiplier指输出通道。每个输入通道 k 使用不同的过滤器 [filter_height, filter_width, channel_multiplier] 扩展到 k*channel_multiplier。它不进行cross-channel运算，在一些文献中，它被称为channel-wise spatial convolution。上述过程可以总结为将每个过滤器的内核分别应用于每个通道并连接输出。
对于conv2d，k表示输出通道，q表示输入通道。它在所有输入通道中求和，这意味着每个输出通道 k 通过 [filter_height, filter_width, in_channels] 过滤器与所有 q 输入通道相关联。

例如，

input_size: (_, 14, 14, 32)
filter of conv2d: (3, 3, 32, 64)
params of conv2d filter: 3x3x32x64
filter of depthwise_conv2d: (3, 3, 32, 64)
params of depthwise_conv2d filter: 3x3x32x64

假设 stride = 1 with padding，那么

output of conv2d: (_, 14, 14, 64)
output of depthwise_conv2d: (_, 14, 14, 32*64)

更多见解：

标准的卷积运算可以分为两步：深度卷积和归约（求和）。
Depthwise Convolution相当于在Group Convolution中设置输入通道的组数
通常，depthwise_conv2d后面跟着pointwise_conv2d（一个1x1的卷积用于减少目的），做一个separable_conv2d。查看 Xception, MobileNet 了解更多详情。

tf.nn_conv2d 和 tf.nn.depthwise_conv2d 之间的区别

Difference between tf.nn_conv2d and tf.nn.depthwise_conv2d

python

deep-learning

conv-neural-network

tensorflow