PyTorch 移动平均计算创建就地操作
PyTorch moving average computation creates inplace operation
我有一个依赖于“指数移动平均线”的损失函数Z
。一个最小的例子(特别注意 getUpdatedZ
函数):
import torch
import torch.nn as nn
class FeedForward(nn.Module):
def __init__(self):
super(FeedForward, self).__init__()
self.model = nn.Sequential(nn.Linear(1, 100),
nn.ReLU(),
nn.Linear(100, 1))
def forward(self, x):
return self.model(x)
model = FeedForward()
nEpochs = 100
optimizer = torch.optim.Adam(params=model.parameters(), lr=1e-3)
def getTrainingPoints():
return torch.rand(1000, 1)
def lossFunction(X, Z):
# Returning Z here is enough to expose the problem. The real loss is more complicated.
return Z
def getUpdatedZ(X, Z):
U = model(X)
Znew = torch.mean(U)
# Having Z in this computation creates an inplace operation (I'm not sure why).
# Returning, for example, Znew, does not cause any issues (but the computation is incorrect)
return 0.2 * Z + 0.8 * Znew
Z = torch.tensor([1.0])
X = getTrainingPoints()
for i in range(nEpochs):
optimizer.zero_grad()
Z = getUpdatedZ(X, Z)
loss = lossFunction(X, Z)
# loss function depends on gradient of the model in the real version of the code, hence retain_graph=True
loss.backward(retain_graph=True)
optimizer.step()
我收到以下错误:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
经过一些尝试,我认为错误的出现是因为你正在计算一个递归函数(Z = getUpdatedZ(X, Z)
)但是你正在改变它的一些参数(Linear
模块的权重)通过 optimizer.step()
.
的每次迭代
您可以 backward()
就在 for 循环的末尾,或者您可能想要打破自微分图,例如在 loss.backward()
之后调用 Z.detach()
。有时这个技巧被用来避免过于复杂和低效的反向传播(检查,例如this)。
但是在这两种情况下,这都会改变优化函数的结构,因此请确定您在做什么。
我有一个依赖于“指数移动平均线”的损失函数Z
。一个最小的例子(特别注意 getUpdatedZ
函数):
import torch
import torch.nn as nn
class FeedForward(nn.Module):
def __init__(self):
super(FeedForward, self).__init__()
self.model = nn.Sequential(nn.Linear(1, 100),
nn.ReLU(),
nn.Linear(100, 1))
def forward(self, x):
return self.model(x)
model = FeedForward()
nEpochs = 100
optimizer = torch.optim.Adam(params=model.parameters(), lr=1e-3)
def getTrainingPoints():
return torch.rand(1000, 1)
def lossFunction(X, Z):
# Returning Z here is enough to expose the problem. The real loss is more complicated.
return Z
def getUpdatedZ(X, Z):
U = model(X)
Znew = torch.mean(U)
# Having Z in this computation creates an inplace operation (I'm not sure why).
# Returning, for example, Znew, does not cause any issues (but the computation is incorrect)
return 0.2 * Z + 0.8 * Znew
Z = torch.tensor([1.0])
X = getTrainingPoints()
for i in range(nEpochs):
optimizer.zero_grad()
Z = getUpdatedZ(X, Z)
loss = lossFunction(X, Z)
# loss function depends on gradient of the model in the real version of the code, hence retain_graph=True
loss.backward(retain_graph=True)
optimizer.step()
我收到以下错误:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
经过一些尝试,我认为错误的出现是因为你正在计算一个递归函数(Z = getUpdatedZ(X, Z)
)但是你正在改变它的一些参数(Linear
模块的权重)通过 optimizer.step()
.
您可以 backward()
就在 for 循环的末尾,或者您可能想要打破自微分图,例如在 loss.backward()
之后调用 Z.detach()
。有时这个技巧被用来避免过于复杂和低效的反向传播(检查,例如this)。
但是在这两种情况下,这都会改变优化函数的结构,因此请确定您在做什么。