停止梯度更新架构中[子]网络的权重

Stopping gradients updating weights of a [sub]network in an architecture

我的架构如下(使用 nngraph 构建):

require 'nn'
require 'nngraph'


input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolution(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())

output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})


target1 = torch.rand(20, 51, 51)
target2 = torch.rand(2, 49, 49)
target2[target2:gt(0.5)] = 1
target2[target2:lt(0.5)] = 0
-- Do a forward pass
out1, out2 = unpack(gMod:forward(torch.rand(1, 56, 56)))

cr1 = nn.MSECriterion()
cr1:forward(out1, target1)
gradient1 = cr1:backward(out1, target1)

cr2 = nn.BCECriterion()
cr2:forward(out2, target2)
gradient2 = cr2:backward(out2, target2)


-- Now update the weights for the networks
LR = 0.001
gMod:backward(input, {gradient1, gradient2})
gMod:updateParameters(LR)

我想知道:

1) 如何stop gradient2更新net1的权重,并且只有助于更新的权重]net2net3?

2)如何防止gradient2更新net3权重,却更新其他子[网络]权重?

您是否尝试过停止 net1 上的反向传播?

net1.updateGradInput = function(self, inp, out) end
net1.accGradParameters = function(self,inp, out) end

只需将此代码放在 gradient1 = cr1:backward(out1, target1) 之后,它应该可以工作。

我找到了问题的解决方案。下面我post每一个的相关代码:

问题 1:

这有点棘手,但完全可行。如果 net2 的第一层权重不应该用 gradient2 更新,则需要修改之后层的 updateGradInput() 函数并使其输出零张量.这是在以下代码中完成的:

input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolutionInputGrad0(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())

-- Modifying the updateGradInput function so that it will output a zeroed-out tensor at the first layer of net2
local tempLayer = net2:get(1)
function tempLayer:updateGradInput(input, gradOutput)
         self.gradInput:resizeAs(input):zero()
         return self.gradInput
end

output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})

-- Everything else is the same ...

问题 2:

input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolution(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())

net3.updateParameters = function() end -- Doing this prevents net3 weights get updated during the backward pass since the updateParameters function has been over-ridden

output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})

-- Everything else is the same ...