PyTorch 移动平均计算创建就地操作

PyTorch moving average computation creates inplace operation

我有一个依赖于“指数移动平均线”的损失函数Z。一个最小的例子(特别注意 getUpdatedZ 函数):

import torch
import torch.nn as nn

class FeedForward(nn.Module):
    def __init__(self):
        super(FeedForward, self).__init__()
      
        self.model = nn.Sequential(nn.Linear(1, 100),
                                   nn.ReLU(),
                                   nn.Linear(100, 1))
    
    def forward(self, x):
        return self.model(x)

model = FeedForward()
nEpochs = 100

optimizer = torch.optim.Adam(params=model.parameters(), lr=1e-3)

def getTrainingPoints():
  return torch.rand(1000, 1)

def lossFunction(X, Z):
  # Returning Z here is enough to expose the problem. The real loss is more complicated.
  return Z

def getUpdatedZ(X, Z):
  U = model(X)
  Znew = torch.mean(U)
  # Having Z in this computation creates an inplace operation (I'm not sure why).
  # Returning, for example, Znew, does not cause any issues (but the computation is incorrect)
  return 0.2 * Z + 0.8 * Znew 

Z = torch.tensor([1.0])
X = getTrainingPoints()
for i in range(nEpochs):
  optimizer.zero_grad()
  Z = getUpdatedZ(X, Z)
  loss = lossFunction(X, Z)
  # loss function depends on gradient of the model in the real version of the code, hence retain_graph=True
  loss.backward(retain_graph=True) 
  optimizer.step()

我收到以下错误:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

经过一些尝试,我认为错误的出现是因为你正在计算一个递归函数(Z = getUpdatedZ(X, Z))但是你正在改变它的一些参数(Linear 模块的权重)通过 optimizer.step().

的每次迭代

您可以 backward() 就在 for 循环的末尾,或者您可能想要打破自微分图,例如在 loss.backward() 之后调用 Z.detach()。有时这个技巧被用来避免过于复杂和低效的反向传播(检查,例如this)。

但是在这两种情况下,这都会改变优化函数的结构,因此请确定您在做什么。