"In PyTorch how are layer weights and biases initialized by default?" 的跟进

Follow-up to "In PyTorch how are layer weights and biases initialized by default?"

this question 中得票最多的答案中说:

Most layers are initialized using Kaiming Uniform method. Example layers include Linear, Conv2d, RNN etc.

其实我在想:这个是从哪里知道的?例如,我想知道 PyTorch 1.9.0 的 torch.nn.Conv2dtorch.nn.BatchNorm2d 的默认初始化。对于torch.nn.Linear,我找到了答案here(来自上述问题的第二个答案)。

nn.Conv1d, nn.Conv2d, and nn.Conv3d inherit from the _ConvNd class. This class has a reset_parameters等卷积模块的实现方式与nn.Linear:

相同
def reset_parameters(self) -> None:
    # Setting a=sqrt(5) in kaiming_uniform is the same as initializing with
    # uniform(-1/sqrt(k), 1/sqrt(k)), where k = weight.size(1) * prod(*kernel_size)
    # For more details see: 
    # https://github.com/pytorch/pytorch/issues/15314#issuecomment-477448573
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
    if self.bias is not None:
        fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
        bound = 1 / math.sqrt(fan_in)
        init.uniform_(self.bias, -bound, bound)

至于nn.BatchNorm2d,有reset_parameters and reset_running_stats功能:

def reset_parameters(self) -> None:
    self.reset_running_stats()
    if self.affine:
        init.ones_(self.weight)
        init.zeros_(self.bias)

def reset_running_stats(self) -> None:
    if self.track_running_stats:
        # running_mean/running_var/num_batches... are registered at runtime depending
        # if self.track_running_stats is on
        self.running_mean.zero_()  # type: ignore[operator]
        self.running_var.fill_(1)  # type: ignore[operator]
        self.num_batches_tracked.zero_()  # type: ignore[operator]