Pytorch - RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed
Pytorch - RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed
我一直运行进入这个错误:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
我已经在Pytorch论坛上搜索过了,但仍然找不到我的自定义损失函数做错了什么。我的模型是 nn.GRU,这是我的自定义损失函数:
def _loss(outputs, session, items): # `items` is a dict() contains embedding of all items
def f(output, target):
pos = torch.from_numpy(np.array([items[target["click"]]])).float()
neg = torch.from_numpy(np.array([items[idx] for idx in target["suggest_list"] if idx != target["click"]])).float()
if USE_CUDA:
pos, neg = pos.cuda(), neg.cuda()
pos, neg = Variable(pos), Variable(neg)
pos = F.cosine_similarity(output, pos)
if neg.size()[0] == 0:
return torch.mean(F.logsigmoid(pos))
neg = F.cosine_similarity(output.expand_as(neg), neg)
return torch.mean(F.logsigmoid(pos - neg))
loss = map(f, outputs, session)
return -torch.mean(torch.cat(loss))
训练代码:
# zero the parameter gradients
model.zero_grad()
# forward + backward + optimize
outputs, hidden = model(inputs, hidden)
loss = _loss(outputs, session, items)
acc_loss += loss.data[0]
loss.backward()
# Add parameters' gradients to their values, multiplied by learning rate
for p in model.parameters():
p.data.add_(-learning_rate, p.grad.data)
问题出在我的训练循环中:它不会在批次之间分离或重新打包隐藏状态?如果是这样,那么 loss.backward()
会尝试一直反向传播到时间的开始,这适用于第一批但不适用于第二批,因为第一批的图表已被丢弃。
有两种可能的解决方案。
1) detach/repackage 批次之间的隐藏状态。有(在
至少)三种方法(我选择了这个解决方案):
hidden.detach_()
hidden = hidden.detach()
2) 将 loss.backward() 替换为 loss.backward(retain_graph=True)
但要知道每个连续的批次都将比前一个批次花费更多的时间,因为它必须一直反向传播到开始第一批。
我一直运行进入这个错误:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
我已经在Pytorch论坛上搜索过了,但仍然找不到我的自定义损失函数做错了什么。我的模型是 nn.GRU,这是我的自定义损失函数:
def _loss(outputs, session, items): # `items` is a dict() contains embedding of all items
def f(output, target):
pos = torch.from_numpy(np.array([items[target["click"]]])).float()
neg = torch.from_numpy(np.array([items[idx] for idx in target["suggest_list"] if idx != target["click"]])).float()
if USE_CUDA:
pos, neg = pos.cuda(), neg.cuda()
pos, neg = Variable(pos), Variable(neg)
pos = F.cosine_similarity(output, pos)
if neg.size()[0] == 0:
return torch.mean(F.logsigmoid(pos))
neg = F.cosine_similarity(output.expand_as(neg), neg)
return torch.mean(F.logsigmoid(pos - neg))
loss = map(f, outputs, session)
return -torch.mean(torch.cat(loss))
训练代码:
# zero the parameter gradients
model.zero_grad()
# forward + backward + optimize
outputs, hidden = model(inputs, hidden)
loss = _loss(outputs, session, items)
acc_loss += loss.data[0]
loss.backward()
# Add parameters' gradients to their values, multiplied by learning rate
for p in model.parameters():
p.data.add_(-learning_rate, p.grad.data)
问题出在我的训练循环中:它不会在批次之间分离或重新打包隐藏状态?如果是这样,那么 loss.backward()
会尝试一直反向传播到时间的开始,这适用于第一批但不适用于第二批,因为第一批的图表已被丢弃。
有两种可能的解决方案。
1) detach/repackage 批次之间的隐藏状态。有(在 至少)三种方法(我选择了这个解决方案):
hidden.detach_()
hidden = hidden.detach()
2) 将 loss.backward() 替换为 loss.backward(retain_graph=True)
但要知道每个连续的批次都将比前一个批次花费更多的时间,因为它必须一直反向传播到开始第一批。