解释 Caffe 模型

Question

我正在尝试解释和理解用 Caffe .proto 编写的模型。

昨天我看到了 Shai in 的样本 'deploy.prototxt'，引用如下：

layer {
   name: "ip1_a"
   bottom: "data_a"
   top: "ip1_a"
   type: "InnerProduct"
   inner_product_param {
     num_output: 10
   }
   param {
     name: "ip1_w"  # NOTE THIS NAME!
     lr_mult: 1
   }
   param {
     name: "ip1_b"
     lr_mult: 2
   }
 }
 layer {
   name: "ip1_b"
   bottom: "data_b"
   top: "ip1_b"
   type: "InnerProduct"
   inner_product_param {
     num_output: 10
   }
   param {
     name: "ip1_w"  # NOTE THIS NAME: it's the same!
     lr_mult: 10 # different LR for this branch
   }
   param {
     name: "ip1_b"
     lr_mult: 20
   }
 }
 # one layer to combine them     
 layer {
   type: "Concat"
   bottom: "ip1_a"
   bottom: "ip1_b"
   top: "ip1_combine"
   name: "concat"
 }
 layer {
   name: "joint_ip"
   type: "InnerProduct"
   bottom: "ip1_combine"
   top: "joint_ip"
   inner_product_param {
     num_output: 30
   }
 }

我将此模型定义理解为：

     data_a         data_b
        |             |
        |             |
     -------       -------   
    | ip1_a |     | ip1_b |
     -------       -------
        |             |
        |             |
      ip1_a         ip1_b
        |             |
        |             |
        V             V
        ~~~~~~~~~~~~~~~
               |
               |
               V
         ------------- 
        |    concat   |
         ------------- 
               |
               |
         ip1_combine
               |
               |
         ------------- 
        |   joint_ip  |
         ------------- 
               |
               |
            joint_ip

blob ip1_a 由 ip1_a 层训练，权重用 ip1_w(lr:1) 初始化，偏差用 ip1_b(lr:2) 初始化。 blob ip1_a 实际上是用 ip1_w 初始化的新学习权重。习得的偏见没有名字。

在一些模型中，我们可以发现一些层有：

lr_mult:1
lr_mult:2

其中 lr_mult 的第一个实例始终对应于权重，下一个实例对应于偏差。

我的上述理解是否正确？

Answer 1

您正在混合两种数据类型：输入（训练）数据和网络参数。
在训练期间，输入数据固定为已知的 training/validation 集，只有网络参数发生变化。相反，在部署网络时，数据会更改为新图像，而网络参数是固定的。有关 caffe 存储这两种类型数据的方式的一些深入描述，请参阅。

在您展示的示例中，有两个输入训练数据路径：data_a 和 data_b 每次可能不同图像。输入 blob 通过 InnerProduct 层分别变成 ip1_a 和 ip1_b blob。然后将它们连接成一个单一的 blob ip1_combined，后者又被送入最终的 InnerProduct 层。

另一方面，该模型有一组参数：第一个内积层的 ip1_w 和 ip1_b（权重和偏差）。在此特定示例中，层的参数被显式命名以表明它们在 ip1_a 和 ip1_b 层之间共享这一事实。

至于两个lr_mult: 那么是的，第一个是权重的LR乘数，第二个是偏置项。

解释 Caffe 模型

Interpreting Caffe models

machine-learning

neural-network

deep-learning

caffe