为什么 tfa.layers.GroupNormalization(groups=1) 产生的输出与 LayerNormalization 不同?

Why does tfa.layers.GroupNormalization(groups=1) produce different output than LayerNormalization?

来自 group normalization documentation in tensorflow addons,它指出如果组数设置为 1,则 group norm layer 应该成为 layer normalization。

然而,当我通过将第一层称为测试张量来尝试此操作时,结果有所不同。似乎组范数层计算了整个时间以及通道轴的均值和方差,而层范数独立地为每个通道的向量计算它。

这是一个错误还是我遗漏了什么? layer norm 的当前行为实际上是我正在做的事情所需要的。

这是 GroupNormalization 的文档:

In [5]: x = tf.constant([[[1, 2], [3, 40]], [[1 , -1], [2, 200]]], dtype = tf.float32)
In [6]: tf.keras.layers.LayerNormalization()(x)                                                                                                               
Out[6]: 
<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-0.99800587,  0.99800587],
        [-0.99999857,  0.99999857]],

       [[ 0.9995002 , -0.9995002 ],
        [-1.        ,  1.        ]]], dtype=float32)>

In [7]: tfa.layers.GroupNormalization(groups = 1)(x)                                                                                                          
Out[7]: 
<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-0.6375344 , -0.57681686],
        [-0.5160993 ,  1.7304504 ]],

       [[-0.5734435 , -0.5966129 ],
        [-0.5618587 ,  1.7319152 ]]], dtype=float32)>

根据 tf.keras.layers.LayerNormalization in TF 2.4.1, source 的文档:

Note that other implementations of layer normalization may choose to define gamma and beta over a separate set of axes from the axes being normalized across. For example, Group Normalization (Wu et al. 2018) with a group size of 1 corresponds to a Layer Normalization that normalizes across height, width, and channel and has gamma and beta span only the channel dimension. So, this Layer Normalization implementation will not match a Group Normalization layer with group size set to 1.