停止梯度更新架构中[子]网络的权重
Stopping gradients updating weights of a [sub]network in an architecture
我的架构如下(使用 nngraph 构建):
require 'nn'
require 'nngraph'
input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolution(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())
output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})
target1 = torch.rand(20, 51, 51)
target2 = torch.rand(2, 49, 49)
target2[target2:gt(0.5)] = 1
target2[target2:lt(0.5)] = 0
-- Do a forward pass
out1, out2 = unpack(gMod:forward(torch.rand(1, 56, 56)))
cr1 = nn.MSECriterion()
cr1:forward(out1, target1)
gradient1 = cr1:backward(out1, target1)
cr2 = nn.BCECriterion()
cr2:forward(out2, target2)
gradient2 = cr2:backward(out2, target2)
-- Now update the weights for the networks
LR = 0.001
gMod:backward(input, {gradient1, gradient2})
gMod:updateParameters(LR)
我想知道:
1) 如何stop gradient2更新net1的权重,并且只有助于更新的权重]net2 和 net3?
2)如何防止gradient2更新net3权重,却更新其他子[网络]权重?
您是否尝试过停止 net1 上的反向传播?
net1.updateGradInput = function(self, inp, out) end
net1.accGradParameters = function(self,inp, out) end
只需将此代码放在 gradient1 = cr1:backward(out1, target1)
之后,它应该可以工作。
我找到了问题的解决方案。下面我post每一个的相关代码:
问题 1:
这有点棘手,但完全可行。如果 net2
的第一层权重不应该用 gradient2 更新,则需要修改之后层的 updateGradInput() 函数并使其输出零张量.这是在以下代码中完成的:
input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolutionInputGrad0(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())
-- Modifying the updateGradInput function so that it will output a zeroed-out tensor at the first layer of net2
local tempLayer = net2:get(1)
function tempLayer:updateGradInput(input, gradOutput)
self.gradInput:resizeAs(input):zero()
return self.gradInput
end
output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})
-- Everything else is the same ...
问题 2:
input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolution(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())
net3.updateParameters = function() end -- Doing this prevents net3 weights get updated during the backward pass since the updateParameters function has been over-ridden
output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})
-- Everything else is the same ...
我的架构如下(使用 nngraph 构建):
require 'nn'
require 'nngraph'
input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolution(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())
output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})
target1 = torch.rand(20, 51, 51)
target2 = torch.rand(2, 49, 49)
target2[target2:gt(0.5)] = 1
target2[target2:lt(0.5)] = 0
-- Do a forward pass
out1, out2 = unpack(gMod:forward(torch.rand(1, 56, 56)))
cr1 = nn.MSECriterion()
cr1:forward(out1, target1)
gradient1 = cr1:backward(out1, target1)
cr2 = nn.BCECriterion()
cr2:forward(out2, target2)
gradient2 = cr2:backward(out2, target2)
-- Now update the weights for the networks
LR = 0.001
gMod:backward(input, {gradient1, gradient2})
gMod:updateParameters(LR)
我想知道:
1) 如何stop gradient2更新net1的权重,并且只有助于更新的权重]net2 和 net3?
2)如何防止gradient2更新net3权重,却更新其他子[网络]权重?
您是否尝试过停止 net1 上的反向传播?
net1.updateGradInput = function(self, inp, out) end
net1.accGradParameters = function(self,inp, out) end
只需将此代码放在 gradient1 = cr1:backward(out1, target1)
之后,它应该可以工作。
我找到了问题的解决方案。下面我post每一个的相关代码:
问题 1:
这有点棘手,但完全可行。如果 net2
的第一层权重不应该用 gradient2 更新,则需要修改之后层的 updateGradInput() 函数并使其输出零张量.这是在以下代码中完成的:
input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolutionInputGrad0(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())
-- Modifying the updateGradInput function so that it will output a zeroed-out tensor at the first layer of net2
local tempLayer = net2:get(1)
function tempLayer:updateGradInput(input, gradOutput)
self.gradInput:resizeAs(input):zero()
return self.gradInput
end
output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})
-- Everything else is the same ...
问题 2:
input = nn.Identity()()
net1 = nn.Sequential():add(nn.SpatialConvolution(1, 5, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(5, 20, 4, 4))
net2 = nn.Sequential():add(nn.SpatialFullConvolution(20, 5, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialFullConvolution(5, 1, 3, 3)):add(nn.Sigmoid())
net3 = nn.Sequential():add(nn.SpatialConvolution(1, 20, 3, 3)):add(nn.ReLU(true)):add(nn.SpatialConvolution(20, 40, 4, 4)):add(nn.ReLU(true)):add(nn.SpatialConvolution(40, 2, 3, 3)):add(nn.Sigmoid())
net3.updateParameters = function() end -- Doing this prevents net3 weights get updated during the backward pass since the updateParameters function has been over-ridden
output1 = net1(input)
output2 = net2(output1)
output3 = net3(output2)
gMod = nn.gModule({input}, {output1, output3})
-- Everything else is the same ...