使用交叉熵函数时，sigmoid 函数是否会影响未连接到输出层的权重的减速？

Does the sigmoid function effect the slowdown for weights not connected to the output layer when using cross entropy function?

我一直在自己阅读神经网络的误差函数。 http://neuralnetworksanddeeplearning.com/chap3.html 解释说使用交叉熵函数可以避免减速（即，如果预测输出与目标输出相差甚远，网络学习速度会更快）。作者表明连接到输出层的权重将忽略 sigmoid 素数函数，这会导致速度下降。

但是后面的权重呢？通过推导（当使用二次误差函数时我得到相同的推导），我发现这些权重中出现了 sigmoid 素数项。这不会导致经济放缓吗？（可能是我推导错误？）

是的，除最后一层外，所有 sigmoid 层都会减慢学习速度。我想你的推导是正确的，实际上 Quadratic Error、Sigmoid + BinaryCrossEntropyLoss 和 Softmax + SoftmaxCrossEntropyLoss 共享反向传播公式 y_i - y 的相同形式。请参阅此处的三个损失的代码：L2Loss, BinaryLoss, SoftmaxLoss

使用交叉熵函数时，sigmoid 函数是否会影响未连接到输出层的权重的减速？

Does the sigmoid function effect the slowdown for weights not connected to the output layer when using cross entropy function?

machine-learning

backpropagation

neural-network