Error training RNN with pytorch : RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Question

大家好，我正在尝试使用 PyTorch RNN class 创建一个模型并使用小批量训练这个模型。我的数据集是一个简单的时间序列（一个输入一个输出）。这是我的模型的样子：

class RNN_pytorch(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN_pytorch, self).__init__()

        self.hidden_size = hidden_size
        self.input_size = input_size

        self.rnn = nn.RNN(input_size, hidden_size, num_layers=1)
        self.linear =  nn.Linear(hidden_size, output_size)

    def forward(self, x, hidden):

        batch_size = x.size(1)
#         print(batch_size)
        hidden = self.init_hidden(batch_size)

        out, hidden = self.rnn(x, hidden)
#         out = out.view(out.size(1), out.size(2))
        print("Input linear : ", out.size())
        out = self.linear(out)

        return out, hidden

    def init_hidden(self, batch_size):

        hidden = torch.zeros(1, batch_size, self.hidden_size)
#         print(hidden.size())

        return hidden

然后我处理我的数据集并像这样拆分它：

batch_numbers = 13
batch_size = int(len(train_signal[:-1])/batch_numbers)
print("Train sample total size =", len(train_signal[:-1]))
print("Number of batches = ", batch_numbers)
print("Size of batches = {} (train_size / batch_numbers)".format(batch_size))

train_signal_batched = train_signal[:-1].reshape(batch_numbers, batch_size, 1)
train_label_batched = train_signal[1:].reshape(batch_numbers, batch_size, 1)

print("X_train shape =", train_signal_batched.shape)
print("Y_train shape =", train_label_batched.shape)

返回：

Train sample total size = 829439
Number of batches =  13
Size of batches = 63803 (train_size / batch_numbers)
X_train shape = (13, 63803, 1)
Y_train shape = (13, 63803, 1)

到目前为止一切顺利，但后来我尝试训练我的模型：

rnn_mod = RNN_pytorch(1, 16, 1)

criterion = nn.MSELoss()
optimizer = torch.optim.RMSprop(rnn_mod.parameters(), lr=0.01)

n_epochs = 3
hidden = rnn_mod.init_hidden(batch_size)
for epoch in range(1, n_epochs):
    for i, batch in enumerate(train_signal_batched):
        optimizer.zero_grad()
        x = torch.Tensor([batch]).float()
        print("Input : ",x.size())
        out, hidden = rnn_mod.forward(x, hidden)
        print("Output : ",out.size())
        label = torch.Tensor([train_label_batched[i]]).float()
        print("Label : ", label.size())
        loss = criterion(output, label)
        print("Loss : ", loss)
        loss.backward(retain_graph=True)
        optimizer.step()
        print("*", end="")

#     if epoch % 100 == 0:
    print("Step {} --- Loss {}".format(epoch, loss))

导致错误：

Input :  torch.Size([1, 63803, 1])
Input linear :  torch.Size([1, 63803, 16])
Output :  torch.Size([1, 63803, 1])
Label :  torch.Size([1, 63803, 1])
Loss :  tensor(0.0051)

/home/kostia/.virtualenvs/machine-learning/lib/python3.6/site-packages/torch/nn/modules/loss.py:431: UserWarning: Using a target size (torch.Size([1, 63803, 1])) that is different to the input size (torch.Size([1, 1, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(input, target, reduction=self.reduction)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-217-d019358438ff> in <module>
     17         loss = criterion(output, label)
     18         print("Loss : ", loss)
---> 19         loss.backward(retain_graph=True)
     20         optimizer.step()
     21         print("*", end="")

~/.virtualenvs/machine-learning/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    116                 products. Defaults to ``False``.
    117         """
--> 118         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    119 
    120     def register_hook(self, hook):

~/.virtualenvs/machine-learning/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     91     Variable._execution_engine.run_backward(
     92         tensors, grad_tensors, retain_graph, create_graph,
---> 93         allow_unreachable=True)  # allow_unreachable flag
     94 
     95 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

谁能告诉我这里有什么问题，因为老实说我一点头绪都没有？

提前致谢

Answer 1

正如错误所说。损失张量不需要梯度，也没有 gread_fn 函数。你的 out in forward in RNN_pytorch 也应该已经有一个 grad_fn，开始检查那里。看看你是否以某种方式禁用了渐变，例如在某处设置 .eval() 或 .no_grad()。

Error training RNN with pytorch : RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Error training RNN with pytorch : RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

python

deep-learning

recurrent-neural-network

pytorch