如何在不影响梯度的情况下更改 NN 权重?
How can I change the NN weights without affecting the gradients?
假设我有一个简单的神经网络:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.nn.utils import parameters_to_vector
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.fc1 = nn.Linear(1, 2)
self.fc2 = nn.Linear(2, 3)
self.fc3 = nn.Linear(3, 1)
def forward(self, x):
x = self.fc1(x)
x = torch.relu(x)
x = torch.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Model()
opt = optim.Adam(net.parameters())
还有一些功能
features = torch.rand((3,1))
我可以正常训练它:
for i in range(10):
opt.zero_grad()
out = net(features)
loss = torch.mean(torch.square(torch.tensor(5) - torch.sum(out)))
loss.backward()
opt.step()
但是,我有兴趣在批处理中的每个示例之后更新每一层的权重。也就是说,将实际权重值更新为每层不同的某个量。
我可以打印每一层的参数:
for i in range(1):
opt.zero_grad()
out = net(features)
print(parameters_to_vector(net.fc1.parameters()))
print(parameters_to_vector(net.fc2.parameters()))
print(parameters_to_vector(net.fc3.parameters()))
loss = torch.mean(torch.square(torch.tensor(5) - torch.sum(out)))
loss.backward()
opt.step()
如何在不影响梯度的情况下更改反向传播之前的权重值?
假设我希望根据以下函数更新图层权重:
def first_layer_update(weight):
return weight + 1e-3*weight
def second_layer_update(weight):
return 1e-2*weight
def third_layer_update(weight):
return weight - 1e-1*weight
从 pytorch docs,你基本上是在正确的轨道上。可以循环遍历每一层的所有参数,然后直接添加进去
with torch.no_grad():
for param in layer.parameters():
param.weight += 1e-3 # or whatever
- 使用 torch.no_grad
上下文管理器。
这允许您对张量执行(就地或异地)操作,而无需 Autograd 跟踪这些更改。正如 @user3474165 解释的那样:
def first_layer_update(weight):
with torch.no_grad():
return weight + 1e-3*weight
def second_layer_update(weight):
with torch_no_grad():
return 1e-2*weight
def third_layer_update(weight):
with torch.no_grad():
return weight - 1e-1*weight
或者在调用它们时使用上下文管理器不改变函数的不同方式:
with torch.no_grad():
first_layer_update(net.fc1.weight)
second_layer_update(net.fc2.weight)
third_layer_update(net.fc3.weight)
- 使用 @torch.no_grad
装饰器。
一个变体是使用 @torch.no_grad
装饰器:
@torch.no_grad()
def first_layer_update(weight):
return weight + 1e-3*weight
@torch.no_grad():
def second_layer_update(weight):
return 1e-2*weight
@torch.no_grad():
def third_layer_update(weight):
return weight - 1e-1*weight
并用以下方式调用它们:first_layer_update(net.fc1.weight)
、second_layer_update(net.fc2.weight)
等...
- 变异 torch.Tensor.data
.
用 torch.no_grad
上下文包装操作的另一种方法是使用 data
属性改变权重。这意味着调用您的函数:
>>> first_layer_update(net.fc1.weight.data)
>>> second_layer_update(net.fc2.weight.data)
>>> third_layer_update(net.fc3.weight.data)
这会根据各自的更新策略改变三层的权重(而不是偏差)。
简而言之,如果你想改变 nn.Module
的所有参数,你可以这样做:
>>> with torch.no_grad():
... update_policy(parameters_to_vector(net.layer.parameters()))
或
>>> update_policy(parameters_to_vector(net.layer.parameters().data))
假设我有一个简单的神经网络:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.nn.utils import parameters_to_vector
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.fc1 = nn.Linear(1, 2)
self.fc2 = nn.Linear(2, 3)
self.fc3 = nn.Linear(3, 1)
def forward(self, x):
x = self.fc1(x)
x = torch.relu(x)
x = torch.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Model()
opt = optim.Adam(net.parameters())
还有一些功能
features = torch.rand((3,1))
我可以正常训练它:
for i in range(10):
opt.zero_grad()
out = net(features)
loss = torch.mean(torch.square(torch.tensor(5) - torch.sum(out)))
loss.backward()
opt.step()
但是,我有兴趣在批处理中的每个示例之后更新每一层的权重。也就是说,将实际权重值更新为每层不同的某个量。
我可以打印每一层的参数:
for i in range(1):
opt.zero_grad()
out = net(features)
print(parameters_to_vector(net.fc1.parameters()))
print(parameters_to_vector(net.fc2.parameters()))
print(parameters_to_vector(net.fc3.parameters()))
loss = torch.mean(torch.square(torch.tensor(5) - torch.sum(out)))
loss.backward()
opt.step()
如何在不影响梯度的情况下更改反向传播之前的权重值?
假设我希望根据以下函数更新图层权重:
def first_layer_update(weight):
return weight + 1e-3*weight
def second_layer_update(weight):
return 1e-2*weight
def third_layer_update(weight):
return weight - 1e-1*weight
从 pytorch docs,你基本上是在正确的轨道上。可以循环遍历每一层的所有参数,然后直接添加进去
with torch.no_grad():
for param in layer.parameters():
param.weight += 1e-3 # or whatever
- 使用 torch.no_grad
上下文管理器。
这允许您对张量执行(就地或异地)操作,而无需 Autograd 跟踪这些更改。正如 @user3474165 解释的那样:
def first_layer_update(weight):
with torch.no_grad():
return weight + 1e-3*weight
def second_layer_update(weight):
with torch_no_grad():
return 1e-2*weight
def third_layer_update(weight):
with torch.no_grad():
return weight - 1e-1*weight
或者在调用它们时使用上下文管理器不改变函数的不同方式:
with torch.no_grad():
first_layer_update(net.fc1.weight)
second_layer_update(net.fc2.weight)
third_layer_update(net.fc3.weight)
- 使用 @torch.no_grad
装饰器。
一个变体是使用 @torch.no_grad
装饰器:
@torch.no_grad()
def first_layer_update(weight):
return weight + 1e-3*weight
@torch.no_grad():
def second_layer_update(weight):
return 1e-2*weight
@torch.no_grad():
def third_layer_update(weight):
return weight - 1e-1*weight
并用以下方式调用它们:first_layer_update(net.fc1.weight)
、second_layer_update(net.fc2.weight)
等...
- 变异 torch.Tensor.data
.
用 torch.no_grad
上下文包装操作的另一种方法是使用 data
属性改变权重。这意味着调用您的函数:
>>> first_layer_update(net.fc1.weight.data)
>>> second_layer_update(net.fc2.weight.data)
>>> third_layer_update(net.fc3.weight.data)
这会根据各自的更新策略改变三层的权重(而不是偏差)。
简而言之,如果你想改变 nn.Module
的所有参数,你可以这样做:
>>> with torch.no_grad():
... update_policy(parameters_to_vector(net.layer.parameters()))
或
>>> update_policy(parameters_to_vector(net.layer.parameters().data))