如何在pytorch中进行渐变裁剪？

Question

在pytorch中进行梯度裁剪的正确方法是什么？

我遇到了梯度爆炸问题。

Answer 1

clipping_value = 1 # arbitrary value of your choosing
torch.nn.utils.clip_grad_norm(model.parameters(), clipping_value)

我敢肯定它比这个代码片段更深入。

Answer 2

clip_grad_norm (which is actually deprecated in favor of clip_grad_norm_ following the more consistent syntax of a trailing _ when in-place modification is performed) clips the norm of the overall gradient by concatenating all parameters passed to the function, as can be seen from the documentation:

The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place.

从您的示例来看，您似乎想要 clip_grad_value_ 而不是它具有类似的语法并且还就地修改了渐变：

clip_grad_value_(model.parameters(), clip_value)

另一种选择是注册backward hook。这将当前梯度作为输入，并且可能 return 一个张量将用于代替先前的梯度，即修改它。每次计算梯度后都会调用此钩子，即一旦注册钩子就无需手动裁剪：

for p in model.parameters():
    p.register_hook(lambda grad: torch.clamp(grad, -clip_value, clip_value))

Answer 3

来自 here 的更完整示例：

optimizer.zero_grad()        
loss, hidden = model(data, hidden, targets)
loss.backward()

torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)
optimizer.step()

Answer 4

如果您使用的是自动混合精度 (AMP)，则在裁剪之前需要做更多的事情：

optimizer.zero_grad()
loss, hidden = model(data, hidden, targets)
scaler.scale(loss).backward()

# Unscales the gradients of optimizer's assigned params in-place
scaler.unscale_(optimizer)

# Since the gradients of optimizer's assigned params are unscaled, clips as usual:
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)

# optimizer's gradients are already unscaled, so scaler.step does not unscale them,
# although it still skips optimizer.step() if the gradients contain infs or NaNs.
scaler.step(optimizer)

# Updates the scale for next iteration.
scaler.update()

参考：https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping

Answer 5

嗯，我遇到了同样的错误。我尝试使用剪辑规范但它不起作用。

我不想更改网络或添加正则化器。所以我 将优化器更改为 Adam，它起作用了。

然后我使用 Adam 的预训练模型开始训练并使用 SGD + momentum 进行微调。现在正在运行。

如何在pytorch中进行渐变裁剪？

How to do gradient clipping in pytorch?

python

machine-learning

gradient-descent

deep-learning

pytorch