如何在不影响梯度的情况下更改 NN 权重?

How can I change the NN weights without affecting the gradients?

假设我有一个简单的神经网络:

import torch
import torch.nn as nn
import torch.optim as optim
from torch.nn.utils import parameters_to_vector

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(1, 2)
        self.fc2 = nn.Linear(2, 3)
        self.fc3 = nn.Linear(3, 1)

    def forward(self, x):
        x = self.fc1(x)
        x = torch.relu(x)        
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Model()

opt = optim.Adam(net.parameters())

还有一些功能

features = torch.rand((3,1))

我可以正常训练它:

for i in range(10):
    opt.zero_grad()
    out = net(features)
    loss = torch.mean(torch.square(torch.tensor(5) - torch.sum(out)))
    loss.backward()
    opt.step()

但是,我有兴趣在批处理中的每个示例之后更新每一层的权重。也就是说,将实际权重值更新为每层不同的某个量。

我可以打印每一层的参数:

for i in range(1):
    opt.zero_grad()
    out = net(features)
    print(parameters_to_vector(net.fc1.parameters()))
    print(parameters_to_vector(net.fc2.parameters()))
    print(parameters_to_vector(net.fc3.parameters()))
    loss = torch.mean(torch.square(torch.tensor(5) - torch.sum(out)))
    loss.backward()
    opt.step()

如何在不影响梯度的情况下更改反向传播之前的权重值?

假设我希望根据以下函数更新图层权重:

def first_layer_update(weight):
    return weight + 1e-3*weight

def second_layer_update(weight):
    return 1e-2*weight

def third_layer_update(weight):
    return weight - 1e-1*weight

从 pytorch docs,你基本上是在正确的轨道上。可以循环遍历每一层的所有参数,然后直接添加进去

with torch.no_grad():
  for param in layer.parameters():
    param.weight += 1e-3  # or whatever

- 使用 torch.no_grad 上下文管理器。

这允许您对张量执行(就地或异地)操作,而无需 Autograd 跟踪这些更改。正如 @user3474165 解释的那样:

def first_layer_update(weight):
    with torch.no_grad():
        return weight + 1e-3*weight

def second_layer_update(weight):
    with torch_no_grad():
        return 1e-2*weight

def third_layer_update(weight):
    with torch.no_grad():
        return weight - 1e-1*weight

或者在调用它们时使用上下文管理器不改变函数的不同方式:

with torch.no_grad():
    first_layer_update(net.fc1.weight)
    second_layer_update(net.fc2.weight)
    third_layer_update(net.fc3.weight)

- 使用 @torch.no_grad 装饰器。

一个变体是使用 @torch.no_grad 装饰器:

@torch.no_grad()
def first_layer_update(weight):
    return weight + 1e-3*weight

@torch.no_grad():
def second_layer_update(weight):
    return 1e-2*weight

@torch.no_grad():
def third_layer_update(weight):
    return weight - 1e-1*weight

并用以下方式调用它们:first_layer_update(net.fc1.weight)second_layer_update(net.fc2.weight) 等...


- 变异 torch.Tensor.data.

torch.no_grad 上下文包装操作的另一种方法是使用 data 属性改变权重。这意味着调用您的函数:

>>> first_layer_update(net.fc1.weight.data)
>>> second_layer_update(net.fc2.weight.data)
>>> third_layer_update(net.fc3.weight.data)

这会根据各自的更新策略改变三层的权重(而不是偏差)。


简而言之,如果你想改变 nn.Module 的所有参数,你可以这样做:

>>> with torch.no_grad():
...     update_policy(parameters_to_vector(net.layer.parameters()))

>>> update_policy(parameters_to_vector(net.layer.parameters().data))