在密集的 Keras 层中绑定自动编码器权重

Tying Autoencoder Weights in a Dense Keras Layer

我正在尝试在 Keras 中创建一个自定义的 Dense 层以绑定自动编码器中的权重。我已经尝试按照在卷积层中执行此操作的示例 here,但似乎某些步骤不适用于 Dense 层(另外,代码是两年前的代码)。

通过绑定权重,我希望解码层使用编码层的转置权重矩阵。 this article(第 5 页)中也采用了这种方法。以下是文章的相关引述:

Here, we choose both the encoding and decoding activation function to be sigmoid function and only consider the tied weights case, in which W ′ = WT (where WT is the transpose of W ) as most existing deep learning methods do.

在上面的引用中,W是编码层中的权重矩阵,W'(等于W)是解码层的权重矩阵。

我在密集层没有做太多改动。我向构造函数添加了一个 tied_to 参数,它允许您传递要将其绑定到的图层。唯一的其他变化是 build 函数,其代码片段如下:

def build(self, input_shape):
    assert len(input_shape) >= 2
    input_dim = input_shape[-1]

    if self.tied_to is not None:
        self.kernel = K.transpose(self.tied_to.kernel)
        self._non_trainable_weights.append(self.kernel)
    else:
        self.kernel = self.add_weight(shape=(input_dim, self.units),
                                      initializer=self.kernel_initializer,
                                      name='kernel',
                                      regularizer=self.kernel_regularizer,
                                      constraint=self.kernel_constraint)
    if self.use_bias:
        self.bias = self.add_weight(shape=(self.units,),
                                    initializer=self.bias_initializer,
                                    name='bias',
                                    regularizer=self.bias_regularizer,
                                    constraint=self.bias_constraint)
    else:
        self.bias = None
    self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dim})
    self.built = True

下面是__init__方法,这里唯一的变化是增加了tied_to参数。

def __init__(self, units,
             activation=None,
             use_bias=True,
             kernel_initializer='glorot_uniform',
             bias_initializer='zeros',
             kernel_regularizer=None,
             bias_regularizer=None,
             activity_regularizer=None,
             kernel_constraint=None,
             bias_constraint=None,
             tied_to=None,
             **kwargs):
    if 'input_shape' not in kwargs and 'input_dim' in kwargs:
        kwargs['input_shape'] = (kwargs.pop('input_dim'),)
    super(Dense, self).__init__(**kwargs)
    self.units = units
    self.activation = activations.get(activation)
    self.use_bias = use_bias
    self.kernel_initializer = initializers.get(kernel_initializer)
    self.bias_initializer = initializers.get(bias_initializer)
    self.kernel_regularizer = regularizers.get(kernel_regularizer)
    self.bias_regularizer = regularizers.get(bias_regularizer)
    self.activity_regularizer = regularizers.get(activity_regularizer)
    self.kernel_constraint = constraints.get(kernel_constraint)
    self.bias_constraint = constraints.get(bias_constraint)
    self.input_spec = InputSpec(min_ndim=2)
    self.supports_masking = True
    self.tied_to = tied_to

call函数未编辑,但在下面供参考。

def call(self, inputs):
    output = K.dot(inputs, self.kernel)
    if self.use_bias:
        output = K.bias_add(output, self.bias, data_format='channels_last')
    if self.activation is not None:
        output = self.activation(output)
    return output

上面,我添加了一个条件来检查是否设置了 tied_to 参数,如果是,则将层的内核设置为 tied_to 层内核的转置。

下面是用于实例化模型的代码。它是使用 Keras 的顺序 API 完成的,DenseTied 是我的自定义层。

# encoder
#
encoded1 = Dense(2, activation="sigmoid")

decoded1 = DenseTied(4, activation="sigmoid", tied_to=encoded1)

# autoencoder
#
autoencoder = Sequential()
autoencoder.add(encoded1)
autoencoder.add(decoded1)

训练模型后,下面是模型摘要和权重。

autoencoder.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_7 (Dense)              (None, 2)                 10        
_________________________________________________________________
dense_tied_7 (DenseTied)     (None, 4)                 12        
=================================================================
Total params: 22
Trainable params: 14
Non-trainable params: 8
________________________________________________________________

autoencoder.layers[0].get_weights()[0]
array([[-2.122982  ,  0.43029135],
       [-2.1772149 ,  0.16689162],
       [-1.0465667 ,  0.9828905 ],
       [-0.6830663 ,  0.0512633 ]], dtype=float32)


autoencoder.layers[-1].get_weights()[1]
array([[-0.6521988 , -0.7131109 ,  0.14814234,  0.26533198],
       [ 0.04387903, -0.22077179,  0.517225  , -0.21583867]],
      dtype=float32)

如您所见,autoencoder.get_weights() 报告的权重似乎没有关联。

所以在展示我的方法之后,我的问题是,这是在 Dense Keras 层中绑定权重的有效方法吗?我能够 运行 代码,目前正在训练中。看起来损失函数也在合理地减少。我担心这只会在构建模型时将它们设置为相等,但实际上不会绑定它们。我希望后端 transpose 函数通过引擎盖下的引用将它们绑定在一起,但我确信我遗漏了一些东西。

So after showing my approach, my question is, is this a valid way to tie weights in a Dense Keras layer?

是的,有效。

My fear is that this will only set them equal when the model is build, but not actually tie them. My hope is that the backend transpose function is tying them through references under the hood, but I am sure that I am missing something.

它实际上将它们联系在一个计算图中,您可以在打印 model.summary() 时检查这些可训练权重只有一个副本。此外,在训练您的模型后,您可以使用 model.get_weights() 检查相应层的权重。构建模型时实际上还没有权重,只是它们的占位符。

random.seed(1)

class DenseTied(Layer):
    def __init__(self, units,
                 activation=None,
                 use_bias=True,
                 kernel_initializer='glorot_uniform',
                 bias_initializer='zeros',
                 kernel_regularizer=None,
                 bias_regularizer=None,
                 activity_regularizer=None,
                 kernel_constraint=None,
                 bias_constraint=None,
                 tied_to=None,
                 **kwargs):
        self.tied_to = tied_to
        if 'input_shape' not in kwargs and 'input_dim' in kwargs:
            kwargs['input_shape'] = (kwargs.pop('input_dim'),)
        super().__init__(**kwargs)
        self.units = units
        self.activation = activations.get(activation)
        self.use_bias = use_bias
        self.kernel_initializer = initializers.get(kernel_initializer)
        self.bias_initializer = initializers.get(bias_initializer)
        self.kernel_regularizer = regularizers.get(kernel_regularizer)
        self.bias_regularizer = regularizers.get(bias_regularizer)
        self.activity_regularizer = regularizers.get(activity_regularizer)
        self.kernel_constraint = constraints.get(kernel_constraint)
        self.bias_constraint = constraints.get(bias_constraint)
        self.input_spec = InputSpec(min_ndim=2)
        self.supports_masking = True

    def build(self, input_shape):
        assert len(input_shape) >= 2
        input_dim = input_shape[-1]

        if self.tied_to is not None:
            self.kernel = K.transpose(self.tied_to.kernel)
            self._non_trainable_weights.append(self.kernel)
        else:
            self.kernel = self.add_weight(shape=(input_dim, self.units),
                                          initializer=self.kernel_initializer,
                                          name='kernel',
                                          regularizer=self.kernel_regularizer,
                                          constraint=self.kernel_constraint)
        if self.use_bias:
            self.bias = self.add_weight(shape=(self.units,),
                                        initializer=self.bias_initializer,
                                        name='bias',
                                        regularizer=self.bias_regularizer,
                                        constraint=self.bias_constraint)
        else:
            self.bias = None

        self.built = True

    def compute_output_shape(self, input_shape):
        assert input_shape and len(input_shape) >= 2
        assert input_shape[-1] == self.units
        output_shape = list(input_shape)
        output_shape[-1] = self.units
        return tuple(output_shape)

    def call(self, inputs):
        output = K.dot(inputs, self.kernel)
        if self.use_bias:
            output = K.bias_add(output, self.bias, data_format='channels_last')
        if self.activation is not None:
            output = self.activation(output)
        return output


# input_ = Input(shape=(16,), dtype=np.float32)
# encoder
#
encoded1 = Dense(4, activation="sigmoid", input_shape=(4,), use_bias=True)
decoded1 = DenseTied(4, activation="sigmoid", tied_to=encoded1, use_bias=False)

# autoencoder
#
autoencoder = Sequential()
# autoencoder.add(input_)
autoencoder.add(encoded1)
autoencoder.add(decoded1)

autoencoder.compile(optimizer="adam", loss="binary_crossentropy")

print(autoencoder.summary())

autoencoder.fit(x=np.random.rand(100, 4), y=np.random.randint(0, 1, size=(100, 4)))

print(autoencoder.layers[0].get_weights()[0])
print(autoencoder.layers[1].get_weights()[0])

谢谢米哈伊尔·柏林科夫, 一个重要的评论:此代码在 Keras 下运行,但在 TF2.0 中不是以 eager 模式运行。它运行,但训练很差。

关键在于,对象如何存储转置后的权重。 self.kernel = K.transpose(self.tied_to.kernel)

在非急切模式下,这会以正确的方式创建图表。在 eager 模式下,这会失败,可能是因为转置变量的值存储在构建时(== 第一次调用),然后在后续调用中使用。

但是:解决方案是在构建时不改变地存储变量, 并将转置操作放入调用方法中。

我花了几天时间弄明白了,如果这对任何人有帮助,我很高兴。