"In PyTorch how are layer weights and biases initialized by default?" 的跟进
Follow-up to "In PyTorch how are layer weights and biases initialized by default?"
在 this question 中得票最多的答案中说:
Most layers are initialized using Kaiming Uniform method. Example layers include Linear, Conv2d, RNN etc.
其实我在想:这个是从哪里知道的?例如,我想知道 PyTorch 1.9.0 的 torch.nn.Conv2d
和 torch.nn.BatchNorm2d
的默认初始化。对于torch.nn.Linear
,我找到了答案here(来自上述问题的第二个答案)。
nn.Conv1d
, nn.Conv2d
, and nn.Conv3d
inherit from the _ConvNd
class. This class has a reset_parameters
等卷积模块的实现方式与nn.Linear
:
相同
def reset_parameters(self) -> None:
# Setting a=sqrt(5) in kaiming_uniform is the same as initializing with
# uniform(-1/sqrt(k), 1/sqrt(k)), where k = weight.size(1) * prod(*kernel_size)
# For more details see:
# https://github.com/pytorch/pytorch/issues/15314#issuecomment-477448573
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)
至于nn.BatchNorm2d
,有reset_parameters
and reset_running_stats
功能:
def reset_parameters(self) -> None:
self.reset_running_stats()
if self.affine:
init.ones_(self.weight)
init.zeros_(self.bias)
def reset_running_stats(self) -> None:
if self.track_running_stats:
# running_mean/running_var/num_batches... are registered at runtime depending
# if self.track_running_stats is on
self.running_mean.zero_() # type: ignore[operator]
self.running_var.fill_(1) # type: ignore[operator]
self.num_batches_tracked.zero_() # type: ignore[operator]
在 this question 中得票最多的答案中说:
Most layers are initialized using Kaiming Uniform method. Example layers include Linear, Conv2d, RNN etc.
其实我在想:这个是从哪里知道的?例如,我想知道 PyTorch 1.9.0 的 torch.nn.Conv2d
和 torch.nn.BatchNorm2d
的默认初始化。对于torch.nn.Linear
,我找到了答案here(来自上述问题的第二个答案)。
nn.Conv1d
, nn.Conv2d
, and nn.Conv3d
inherit from the _ConvNd
class. This class has a reset_parameters
等卷积模块的实现方式与nn.Linear
:
def reset_parameters(self) -> None:
# Setting a=sqrt(5) in kaiming_uniform is the same as initializing with
# uniform(-1/sqrt(k), 1/sqrt(k)), where k = weight.size(1) * prod(*kernel_size)
# For more details see:
# https://github.com/pytorch/pytorch/issues/15314#issuecomment-477448573
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)
至于nn.BatchNorm2d
,有reset_parameters
and reset_running_stats
功能:
def reset_parameters(self) -> None:
self.reset_running_stats()
if self.affine:
init.ones_(self.weight)
init.zeros_(self.bias)
def reset_running_stats(self) -> None:
if self.track_running_stats:
# running_mean/running_var/num_batches... are registered at runtime depending
# if self.track_running_stats is on
self.running_mean.zero_() # type: ignore[operator]
self.running_var.fill_(1) # type: ignore[operator]
self.num_batches_tracked.zero_() # type: ignore[operator]