如何计算梯度来欺骗图像分类器？

Question

我正在尝试让标准的图像分类器示例在 TensorFlow 中运行。

（即按照梯度调整输入图像，使其被错误分类，例如https://codewords.recurse.com/issues/five/why-do-neural-networks-think-a-panda-is-a-vulture。）

我已经从 https://www.tensorflow.org/versions/master/tutorials/image_recognition/index.html 下载了 inception-v3 模型，并使用它对图像进行分类。

但是我很难计算梯度来调整输入图像。我希望我能得到一些关于它在 TensorFlow 中如何工作的帮助。

这是我一直在尝试的基本想法：

with tf.Session() as sess: 
  feed_dict = {'DecodeJpeg/contents:0': image_data}
  softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
    # at this point, sess.run(softmax_tensor, feed_dict) works...
  input_tensor = sess.graph.get_tensor_by_name('DecodeJpeg/contents:0')
  grad = tf.gradients(softmax_tensor, input_tensor)[0]
  real_grad = grad.eval(feed_dict)

尽管 sess.run() 正常工作，tf.gradients() 只是返回 [None]。这显然是一个非常初学者的问题，但是任何人都可以指出我在这里出错的地方吗？为什么梯度没有做任何事情？

Answer 1

tf.gradients() returns [None] 的原因是 input_tensor 在之前进行了 non-differentiable 转换（即 JPEG 解码和转换）它被送入 Inception 网络。相反，您应该对 JPEG 解码的 result 进行操作（EDIT: and cast），如下所示：

# This tensor is the result of the DecodeJpeg op.
decoded_input_tensor = sess.graph.get_tensor_by_name('Cast:0')

grad = tf.gradients(softmax_tensor, decoded_input_tensor)[0]

real_grad = grad.eval(feed_dict)

生成欺骗分类器的图像后，您可以使用 tf.image.encode_jpeg() 操作将其转换回 JPEG 图像。

如何计算梯度来欺骗图像分类器？

How to compute gradients to fool an image classifier?

python

computer-vision

tensorflow