PyTorch 模块如何做 back prop

Question

虽然按照 extending PyTorch - adding a module, I noticed while extending Module 上的说明进行操作，但我们实际上不必实施向后功能。我们唯一需要做的就是在 forward 函数中应用 Function 实例，PyTorch 可以在执行 back prop 时自动调用 Function 实例中的 backward。这对我来说似乎很神奇，因为我们甚至没有注册我们使用的 Function 实例。我查看了源代码，但没有找到任何相关内容。任何人都可以指出所有这些实际发生的地方吗？

Answer 1

也许我说的不对，但我有不同的看法。

后向函数已定义并被前向函数调用。

例如：

#!/usr/bin/env python
# encoding: utf-8

###############################################################
# Parametrized example
# --------------------
#
# This implements a layer with learnable weights.
#
# It implements the Cross-correlation with a learnable kernel.
#
# In deep learning literature, it’s confusingly referred to as
# Convolution.
#
# The backward computes the gradients wrt the input and gradients wrt the
# filter.
#
# **Implementation:**
#
# *Please Note that the implementation serves as an illustration, and we
# did not verify it’s correctness*

import torch
from torch.autograd import Function
from torch.autograd import Variable

from scipy.signal import convolve2d, correlate2d
from torch.nn.modules.module import Module
from torch.nn.parameter import Parameter


class ScipyConv2dFunction(Function):
    @staticmethod
    def forward(ctx, input, filter):
        result = correlate2d(input.numpy(), filter.numpy(), mode='valid')
        ctx.save_for_backward(input, filter)
        return input.new(result)

    @staticmethod
    def backward(ctx, grad_output):
        input, filter = ctx.saved_tensors
        grad_output = grad_output.data
        grad_input = convolve2d(grad_output.numpy(), filter.t().numpy(), mode='full')
        grad_filter = convolve2d(input.numpy(), grad_output.numpy(), mode='valid')

        return Variable(grad_output.new(grad_input)), \
            Variable(grad_output.new(grad_filter))


class ScipyConv2d(Module):

    def __init__(self, kh, kw):
        super(ScipyConv2d, self).__init__()
        self.filter = Parameter(torch.randn(kh, kw))

    def forward(self, input):
        return ScipyConv2dFunction.apply(input, self.filter)

###############################################################
# **Example usage:**

module = ScipyConv2d(3, 3)
print(list(module.parameters()))
input = Variable(torch.randn(10, 10), requires_grad=True)
output = module(input)
print(output)
output.backward(torch.randn(8, 8))
print(input.grad)

在本例中，后向函数由 ScipyConv2dFunction 函数定义。

并且ScipyConv2dFunction被正向函数调用。

我说得对吗？

Answer 2

不必实施 backward() 是 PyTorch 或任何其他 DL 框架如此有价值的原因。事实上，实施 backward() 只应在需要扰乱网络梯度的非常特殊的情况下进行（或者当您创建无法使用 PyTorch 的 built-in 函数表达的自定义函数时）。

PyTorch 使用计算图计算向后梯度，该计算图跟踪在前向传递期间完成的操作。在 Variable 上完成的任何操作都隐式地在此处注册。然后是从调用它的变量向后遍历图，并应用导数链规则来计算梯度的问题。

PyTorch 的 About 页面对图形及其一般工作方式进行了很好的可视化。如果您需要更多详细信息，我还建议您在 Google 上查找计算图和 autograd 机制。

编辑：所有这一切发生的源代码将在 PyTorch 代码库的 C 部分，实际图形在此处实现。经过一番挖掘，我发现 this:

/// Evaluates the function on the given inputs and returns the result of the
/// function call.
variable_list operator()(const variable_list& inputs) {
    profiler::RecordFunction rec(this);
    if (jit::tracer::isTracingVar(inputs)) {
        return traced_apply(inputs);
    }
    return apply(inputs);
}

所以在每个函数中，PyTorch 首先检查其输入是否需要跟踪，然后执行 trace_apply() 作为实现 here。您可以看到正在创建并附加到图中的节点：

// Insert a CppOp in the trace.
auto& graph = state->graph;
std::vector<VariableFlags> var_flags;
for(auto & input: inputs) {
    var_flags.push_back(VariableFlags::of(input));
}
auto* this_node = graph->createCppOp(get_shared_ptr(), std::move(var_flags));
// ...
for (auto& input: inputs) {
    this_node->addInput(tracer::getValueTrace(state, input));
}
graph->appendNode(this_node);

我最好的猜测是每个 Function 对象在执行时都会注册自己及其输入（如果需要）。每个 non-functional 调用（例如 variable.dot()）只是遵从相应的函数，所以这仍然适用。

注意：我没有参与 PyTorch 的开发，也绝不是其架构方面的专家。欢迎任何更正或补充。

PyTorch 模块如何做 back prop

How does PyTorch module do the back prop

python

metaprogramming

python-3.x

pytorch