为什么 tfa.layers.GroupNormalization(groups=1) 产生的输出与 LayerNormalization 不同?
Why does tfa.layers.GroupNormalization(groups=1) produce different output than LayerNormalization?
来自 group normalization documentation in tensorflow
addons,它指出如果组数设置为 1,则 group norm layer 应该成为 layer normalization。
然而,当我通过将第一层称为测试张量来尝试此操作时,结果有所不同。似乎组范数层计算了整个时间以及通道轴的均值和方差,而层范数独立地为每个通道的向量计算它。
这是一个错误还是我遗漏了什么? layer norm 的当前行为实际上是我正在做的事情所需要的。
这是 GroupNormalization 的文档:
In [5]: x = tf.constant([[[1, 2], [3, 40]], [[1 , -1], [2, 200]]], dtype = tf.float32)
In [6]: tf.keras.layers.LayerNormalization()(x)
Out[6]:
<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-0.99800587, 0.99800587],
[-0.99999857, 0.99999857]],
[[ 0.9995002 , -0.9995002 ],
[-1. , 1. ]]], dtype=float32)>
In [7]: tfa.layers.GroupNormalization(groups = 1)(x)
Out[7]:
<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-0.6375344 , -0.57681686],
[-0.5160993 , 1.7304504 ]],
[[-0.5734435 , -0.5966129 ],
[-0.5618587 , 1.7319152 ]]], dtype=float32)>
根据 tf.keras.layers.LayerNormalization
in TF 2.4.1
, source 的文档:
Note that other implementations of layer normalization may choose to
define gamma
and beta
over a separate set of axes from the axes
being normalized across. For example, Group Normalization (Wu et al.
2018) with a group size of 1 corresponds to a Layer Normalization
that normalizes across height, width, and channel and has gamma
and
beta
span only the channel dimension. So, this Layer Normalization
implementation will not match a Group Normalization layer with group
size set to 1.
来自 group normalization documentation in tensorflow
addons,它指出如果组数设置为 1,则 group norm layer 应该成为 layer normalization。
然而,当我通过将第一层称为测试张量来尝试此操作时,结果有所不同。似乎组范数层计算了整个时间以及通道轴的均值和方差,而层范数独立地为每个通道的向量计算它。
这是一个错误还是我遗漏了什么? layer norm 的当前行为实际上是我正在做的事情所需要的。
这是 GroupNormalization 的文档:
In [5]: x = tf.constant([[[1, 2], [3, 40]], [[1 , -1], [2, 200]]], dtype = tf.float32)
In [6]: tf.keras.layers.LayerNormalization()(x)
Out[6]:
<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-0.99800587, 0.99800587],
[-0.99999857, 0.99999857]],
[[ 0.9995002 , -0.9995002 ],
[-1. , 1. ]]], dtype=float32)>
In [7]: tfa.layers.GroupNormalization(groups = 1)(x)
Out[7]:
<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-0.6375344 , -0.57681686],
[-0.5160993 , 1.7304504 ]],
[[-0.5734435 , -0.5966129 ],
[-0.5618587 , 1.7319152 ]]], dtype=float32)>
根据 tf.keras.layers.LayerNormalization
in TF 2.4.1
, source 的文档:
Note that other implementations of layer normalization may choose to define
gamma
andbeta
over a separate set of axes from the axes being normalized across. For example, Group Normalization (Wu et al. 2018) with a group size of 1 corresponds to a Layer Normalization that normalizes across height, width, and channel and hasgamma
andbeta
span only the channel dimension. So, this Layer Normalization implementation will not match a Group Normalization layer with group size set to 1.