与 nn.SpatialBatchNormalization 网络中的参数共享

Parameter sharing in network with nn.SpatialBatchNormalization

我有一个包含三个并行分支的网络,我想共享它们的所有参数,以便它们在训练结束时完全相同。 假设 some_model 是由 cudnn.SpatialConvolutionnn.PReLUnn.SpatialBatchNormalization 组成的标准 nn.Sequential 模块。另外还有一个nn.SpatialDropout,但是它的概率设置为0,所以没有效果。

ptb=nn.ParallelTable()
ptb:add(some_model) 
ptb:add(some_model:clone('weight','bias', 'gradWeight','gradBias'))
ptb:add(some_model:clone('weight','bias', 'gradWeight','gradBias'))

triplet=nn.Sequential()
triplet:add(ptb)

我认为损失函数不相关,但以防万一,我使用 nn.DistanceRatioCriterion. To check that all weights are correctly shared, I pass a table of three identical examples {A,A,A} to the network. Obviously, if the weights are correctly shared, then the output of all three branches should be the same. This holds at the moment of network initialization, but once the paramerters have been updated (say, after one mini-batch iteration), the results of the three branches become different. Through layer by layer inspection, I have noticed that this discrepancy in the output comes from the nn.SpatialBatchNormalization layers in some_model. Therefore, it seems that the parameters from those layers are not properly shared. Following this,我尝试使用附加参数 running_mean 和 [= 调用 clone 24=],但 batchnorm 层的输出仍然不同。此外,这似乎也取消了所有其他网络参数的共享。 nn.SpatialBatchNormalization 模块之间共享参数的正确方法是什么?

好的,我找到了解决方案!从the discussion I had linked to in the question开始,参数running_std好像改成running_var了。使用

调用构造函数
ptb:add(some_model:clone('weight','bias', 'gradWeight','gradBias','running_mean','running_var'))

问题解决。