Tensorflow RNN 权重矩阵初始化

Tensorflow RNN weight matrices initialization

我将 bidirectional_rnnGRUCell 一起使用,但这是关于 Tensorflow 中 RNN 的一般性问题。

我找不到如何初始化权重矩阵(输入到隐藏,隐藏到隐藏)。它们是随机初始化的吗?归零?它们是否针对我创建的每个 LSTM 进行了不同的初始化?

编辑:这个问题的另一个动机是预训练一些 LSTM 并在后续模型中使用它们的权重。我目前不知道如何在不保存所有状态和恢复整个模型的情况下执行此操作。

谢谢。

RNN 模型将使用 get_variable 创建它们的变量,您可以通过使用 variable_scopepassing a default initializer to it. Unless the RNN specifies one explicitly (looking at the code, it doesn't), uniform_unit_scaling_initializer is used.[=15 包装创建这些变量的代码来控制初始化=]

您还应该能够通过声明第二个模型并将 reuse=True 传递给它的 variable_scope 来共享模型权重。只要命名空间匹配,新模型将获得与第一个模型相同的变量。

如何为RNN初始化权重矩阵?

我相信人们正在为 RNN 的权重矩阵使用随机正态初始化。查看卷积神经网络上下文中的 example in TensorFlow GitHub Repo. As the notebook is a bit long, they have a simple LSTM model where they use tf.truncated_normal to initialize weights and tf.zeros to initialize biases (although I have tried using tf.ones to initialize biases before, seem to also work). I believe that the standard deviation is a hyperparameter you could tune yourself. Sometimes weights initialization is important to the gradient flow. Although as far as I know, LSTM itself is designed to handle gradient vanishing problem (and gradient clipping is for helping gradient exploding problem), so perhaps you don't need to be super careful with the setup of std_dev in LSTM? I've read papers recommending Xavier initialization (TF API doc for Xavier initializer)。我不知道人们是否在 RNN 中使用它,但我想如果你想看看它是否有帮助,你甚至可以在 RNN 中尝试这些。

现在跟进@Allen 的回答和你在评论中留下的跟进问题。

如何控制变量作用域的初始化?

使用我 link 编辑的 TensorFlow GitHub python notebook 中的简单 LSTM 模型作为示例。 具体来说,如果我想使用可变范围控制来重构上图中代码的 LSTM 部分,我可能会编写如下代码...

import tensorflow as tf
def initialize_LSTMcell(vocabulary_size, num_nodes, initializer):
    '''initialize LSTMcell weights and biases, set variables to reuse mode'''
    gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
    with tf.variable_scope('LSTMcell') as scope:
        for gate in gates:
            with tf.variable_scope(gate) as gate_scope:
                wx = tf.get_variable("wx", [vocabulary_size, num_nodes], initializer)
                wt = tf.get_variable("wt", [num_nodes, num_nodes], initializer)
                bi = tf.get_variable("bi", [1, num_nodes, tf.constant_initializer(0.0)])
                gate_scope.reuse_variables() #this line can probably be omitted, b.z. by setting 'LSTMcell' scope variables to 'reuse' as the next line, it'll turn on the reuse mode for all its child scope variables
        scope.reuse_variables()

def get_scope_variables(scope_name, variable_names):
    '''a helper function to fetch variable based on scope_name and variable_name'''
    vars = {}
    with tf.variable_scope(scope_name, reuse=True):
        for var_name in variable_names
            var = tf.get_variable(var_name)
            vars[var_name] = var
    return vars

def LSTMcell(i, o, state):
    '''a function for performing LSTMcell computation'''
    gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
    var_names = ['wx', 'wt', 'bi']
    gate_comp = {}
    with tf.variable_scope('LSTMcell', reuse=True):
        for gate in gates:
            vars = get_scope_variables(gate, var_names)
            gate_comp[gate] = tf.matmul(i, vars['wx']) + tf.matmul(o, vars['wt']) + vars['bi']
    state = tf.sigmoid(gate_comp['forget_gate']) * state + tf.sigmoid(gate_comp['input_gate']) * tf.tanh(gate_comp['memory_cell'])
    output = tf.sigmoid(gate_comp['output_gate']) * tf.tanh(state)
    return output, state

重构代码的用法如下...

initialize_LSTMcell(volcabulary_size, num_nodes, tf.truncated_normal_initializer(mean=-0.1, stddev=.01))
#...Doing some computation...
LSTMcell(input_tensor, output_tensor, state)

尽管重构后的代码可能看起来不那么直接,但使用范围变量控制可确保范围封装并允许灵活的变量控制(至少在我看来)。

在预训练一些 LSTM 并在后续模型中使用它们的权重。如何在不保存所有状态和恢复整个模型的情况下做到这一点。

假设你有一个预训练模型冻结并加载,如果你想使用他们冻结的 'wx'、'wt' 和 'bi',你可以简单地找到他们的父范围名称和变量名,然后使用 get_scope_variables func 中的类似结构获取变量。

with tf.variable_scope(scope_name, reuse=True):
    var = tf.get_variable(var_name)

这里是link到understanding variable scope and sharing variables。我希望这是有帮助的。

使用特定初始化器初始化所有内核权重的一种简单方法是将初始化器留在 tf.variable_scope() 中。例如:

with tf.variable_scope('rnn', initializer=tf.variance_scaling_initializer()):
   basic_cell= tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
   outputs, state= tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)