为什么每次反向传播函数时火炬梯度都会线性增加？

Question

我正在尝试使用以下代码了解 PyTorch 反向传播的工作原理。

import torch
import numpy
x = torch.tensor(numpy.e, requires_grad=True)
y = torch.log(x)
y.backward()
print(x.grad)

结果是tensor(0.3679)，果然是1 / x，是log(x)w.r.t的导数。 x 与 x = numpy.e。但是，如果我再次运行最后 3 行而不重新分配 x，即做

y = torch.log(x)
y.backward()
print(x.grad)

那么我会得到tensor(0.7358)，这是之前结果的两倍。为什么会这样？

Answer 1

渐变会累积直到被清除。来自 the docs（强调我的）：

This function accumulates gradients in the leaves - you might need to zero them before calling it.

可以通过 x.grad.zero_() 或 torch.optim.Optimizer、optim.zero_grad().

的方式进行归零

Why does a torch gradient increase linearly every time the function is backpropagated?