如何在依赖于模型 w.r.t 输入的偏导数的 pytorch 中定义损失函数？

Question

在阅读了使用 JAX 库的论文 Neural Ordinary Differential Equations and the blog 之后如何用神经网络求解 ODE 之后，我尝试用 "plain" Pytorch 做同样的事情，但发现了一点 "obscure"：如何正确使用函数的偏导数（在本例中为模型）w.r.t 输入参数之一。

如 2 所示，要恢复手头的问题，目的是求解域中条件为 y(x=0) = 1 的 ODE y' = -2*x*y - 2 <= x <= 2。不使用有限差分，而是将解决方案替换为具有 10 个节点的单层 y(x) = NN(x) 的 NN。

我设法（或多或少）用以下代码复制了博客

import torch
import torch.nn as nn
from torch import optim
import matplotlib.pyplot as plt
import numpy as np 

# Define the NN model to solve the problem
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.lin1 = nn.Linear(1,10)
        self.lin2 = nn.Linear(10,1)

    def forward(self, x):
        x = torch.sigmoid(self.lin1(x))
        x = torch.sigmoid(self.lin2(x))
        return x

model = Model()

# Define loss_function from the Ordinary differential equation to solve
def ODE(x,y):
    dydx, = torch.autograd.grad(y, x, 
    grad_outputs=y.data.new(y.shape).fill_(1),
    create_graph=True, retain_graph=True)

    eq = dydx + 2.* x * y # y' = - 2x*y
    ic = model(torch.tensor([0.])) - 1.    # y(x=0) = 1
    return torch.mean(eq**2) + ic**2

loss_func = ODE

# Define the optimization
# opt = optim.SGD(model.parameters(), lr=0.1, momentum=0.99,nesterov=True) # Equivalent to blog
opt = optim.Adam(model.parameters(),lr=0.1,amsgrad=True) # Got faster convergence with Adam using amsgrad

# Define reference grid 
x_data = torch.linspace(-2.0,2.0,401,requires_grad=True)
x_data = x_data.view(401,1) # reshaping the tensor

# Iterative learning
epochs = 1000
for epoch in range(epochs):
    opt.zero_grad()
    y_trial = model(x_data)
    loss = loss_func(x_data, y_trial)

    loss.backward()
    opt.step()

    if epoch % 100 == 0:
        print('epoch {}, loss {}'.format(epoch, loss.item()))

# Plot Results
plt.plot(x_data.data.numpy(), np.exp(-x_data.data.numpy()**2), label='exact')
plt.plot(x_data.data.numpy(), y_data.data.numpy(), label='approx')
plt.legend()
plt.show()

从这里我设法得到如图所示的结果。 enter image description here

问题在于，在 ODE 泛函的定义中，我宁愿传递类似 (x,fun) 的东西，而不是传递 (x,y)（其中 fun 是我的模型），这样偏导数模型的具体评估可以通过调用完成。所以，像

def ODE(x,fun):
    dydx, = "grad of fun w.r.t x as a function"

    eq = dydx(x) + 2.* x * fun(x)        # y' = - 2x*y
    ic = fun( torch.tensor([0.]) ) - 1.  # y(x=0) = 1
    return torch.mean(eq**2) + ic**2

有什么想法吗？提前致谢

编辑:

经过一些试验，我找到了一种将模型作为输入传递的方法，但发现了另一种奇怪的行为...新问题是用 BC y(x=-2 ) = -1 且 y(x=2) = 1，其解析解为 y(x) = -x^2+x/2+4

让我们稍微修改一下之前的代码：

import torch
import torch.nn as nn
from torch import optim
import matplotlib.pyplot as plt
import numpy as np 

# Define the NN model to solve the equation
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.lin1 = nn.Linear(1,10)
        self.lin2 = nn.Linear(10,1)

    def forward(self, x):
        y = torch.sigmoid(self.lin1(x))
        z = torch.sigmoid(self.lin2(y))
        return z

model = Model()

# Define loss_function from the Ordinary differential equation to solve
def ODE(x,fun):
    y = fun(x)

    dydx = torch.autograd.grad(y, x, 
    grad_outputs=y.data.new(y.shape).fill_(1),
    create_graph=True, retain_graph=True)[0]

    d2ydx2 = torch.autograd.grad(dydx, x, 
    grad_outputs=dydx.data.new(dydx.shape).fill_(1),
    create_graph=True, retain_graph=True)[0]

    eq  = d2ydx2 + torch.tensor([ 2.])             # y'' = - 2
    bc1 =  fun(torch.tensor([-2.])) - torch.tensor([-1.]) # y(x=-2) = -1
    bc2 =  fun(torch.tensor([ 2.])) - torch.tensor([ 1.]) # y(x= 2) =  1
    return torch.mean(eq**2) + bc1**2 + bc2**2

loss_func = ODE

所以，在这里我将模型作为参数传递并设法推导了两次......到目前为止一切顺利。但是，在这种情况下使用 sigmoid 函数不仅没有必要，而且给出的结果与分析结果相去甚远。

如果我将 NN 更改为：

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.lin1 = nn.Linear(1,1)
        self.lin2 = nn.Linear(1,1)

    def forward(self, x):
        y = self.lin1(x)
        z = self.lin2(y)
        return z

在这种情况下，我希望优化通过两个线性函数的双通道，这将检索二阶函数...我得到错误：

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

将选项添加到 dydx 的定义中并不能解决问题，将其添加到 d2ydx2 给出了 NoneType 定义。

图层本身有问题吗？

Answer 1

快速解决方案：

将 allow_unused=True 添加到 .grad 函数。所以，改变

dydx = torch.autograd.grad(
    y, x,
    grad_outputs=y.data.new(y.shape).fill_(1),
    create_graph=True, retain_graph=True)[0]

d2ydx2 = torch.autograd.grad(dydx, x, grad_outputs=dydx.data.new(
    dydx.shape).fill_(1), create_graph=True, retain_graph=True)[0]

到

dydx = torch.autograd.grad(
    y, x,
    grad_outputs=y.data.new(y.shape).fill_(1),
    create_graph=True, retain_graph=True, allow_unused=True)[0]

d2ydx2 = torch.autograd.grad(dydx, x, grad_outputs=dydx.data.new(
    dydx.shape).fill_(1), create_graph=True, retain_graph=True, allow_unused=True)[0]

更多解释：

看看 allow_unused 做了什么：

allow_unused (bool, optional): If ``False``, specifying inputs that were not
        used when computing outputs (and therefore their grad is always zero)
        is an error. Defaults to ``False``.

因此，如果您尝试将 w.r.t 区分为未用于计算值的变量，则会出错。另请注意，仅当您使用线性层时才会发生错误。

这是因为当你使用线性图层时，你有 y=W1*W2*x + b = Wx+b 而 dy/dx 不是 x 的函数，它只是 W。因此，当您尝试区分 dy/dx w.r.t x 时，它会抛出错误。一旦使用 sigmoid，此错误就会消失，因为 dy/dx 将成为 x 的函数。为避免错误，请确保 dy/dx 是 x 的函数或使用 allow_unused=True

如何在依赖于模型 w.r.t 输入的偏导数的 pytorch 中定义损失函数？

How to define a loss function in pytorch with dependency to partial derivatives of the model w.r.t input?

ode

neural-network

pytorch