我如何在 Tensorflow 中的不同输入的不同 RNN 单元之间共享权重?

How do I share weights across different RNN cells that feed in different inputs in Tensorflow?

我很好奇是否有一种好方法可以在不同的 RNN 单元之间共享权重,同时仍然为每个单元提供不同的输入。

我尝试构建的图表是这样的:

其中有三个橙色的 LSTM 单元并行运行,我想在它们之间共享权重。

我已经设法使用占位符实现了与我想要的类似的东西(代码见下文)。但是,使用占位符会破坏优化器的梯度计算,并且不会训练任何超过我使用占位符的点。有没有可能在 Tensorflow 中以更好的方式做到这一点?

我在 Windows 7 上的 Anaconda 环境中使用 Tensorflow 1.2 和 python 3.5。

代码:

def ann_model(cls,data, act=tf.nn.relu):
    with tf.name_scope('ANN'):
        with tf.name_scope('ann_weights'):
            ann_weights = tf.Variable(tf.random_normal([1,
                                                        cls.n_ann_nodes]))
        with tf.name_scope('ann_bias'):
            ann_biases = tf.Variable(tf.random_normal([1]))
        out = act(tf.matmul(data,ann_weights) + ann_biases)
    return out

def rnn_lower_model(cls,data):
    with tf.name_scope('RNN_Model'):
        data_tens = tf.split(data, cls.sequence_length,1)
        for i in range(len(data_tens)):
            data_tens[i] = tf.reshape(data_tens[i],[cls.batch_size,
                                                     cls.n_rnn_inputs])

        rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(cls.n_rnn_nodes_lower)

        outputs, states = tf.contrib.rnn.static_rnn(rnn_cell,
                                                    data_tens,
                                                    dtype=tf.float32)

        with tf.name_scope('RNN_out_weights'):
            out_weights = tf.Variable(
                    tf.random_normal([cls.n_rnn_nodes_lower,1]))
        with tf.name_scope('RNN_out_biases'):
            out_biases = tf.Variable(tf.random_normal([1]))

        #Encode the output of the RNN into one estimate per entry in 
        #the input sequence
        predict_list = []
        for i in range(cls.sequence_length):
            predict_list.append(tf.matmul(outputs[i],
                                          out_weights) 
                                          + out_biases)
    return predict_list

def create_graph(cls,sess):
    #Initializes the graph
    with tf.name_scope('input'):
        cls.x = tf.placeholder('float',[cls.batch_size,
                                       cls.sequence_length,
                                       cls.n_inputs])
    with tf.name_scope('labels'):
        cls.y = tf.placeholder('float',[cls.batch_size,1])
    with tf.name_scope('community_id'):
        cls.c = tf.placeholder('float',[cls.batch_size,1])

    #Define Placeholder to provide variable input into the 
    #RNNs with shared weights    
    cls.input_place = tf.placeholder('float',[cls.batch_size,
                                              cls.sequence_length,
                                              cls.n_rnn_inputs])

    #global step used in optimizer
    global_step = tf.Variable(0,trainable = False)

    #Create ANN
    ann_output = cls.ann_model(cls.c)
    #Combine output of ANN with other input data x
    ann_out_seq = tf.reshape(tf.concat([ann_output for _ in 
                                            range(cls.sequence_length)],1),
                            [cls.batch_size,
                             cls.sequence_length,
                             cls.n_ann_nodes])
    cls.rnn_input = tf.concat([ann_out_seq,cls.x],2)

    #Create 'unrolled' RNN by creating sequence_length many RNN Cells that
    #share the same weights.
    with tf.variable_scope('Lower_RNNs'):
        #Create RNNs
        daily_prediction, daily_prediction1 =[cls.rnn_lower_model(cls.input_place)]*2

当训练小批量分两步计算时:

RNNinput = sess.run(cls.rnn_input,feed_dict = {
                                            cls.x:batch_x,
                                            cls.y:batch_y,
                                            cls.c:batch_c})
_ = sess.run(cls.optimizer, feed_dict={cls.input_place:RNNinput,
                                       cls.y:batch_y,
                                       cls.x:batch_x,
                                       cls.c:batch_c})

感谢您的帮助。任何想法将不胜感激。

我最终稍微重新考虑了我的架构并提出了一个更可行的解决方案。

我没有复制 LSTM 单元的中间层来创建三个具有相同权重的不同单元,而是选择 运行 相同的单元三次。每个 运行 的结果存储在 'buffer' 中,例如 tf.Variable,然后整个变量用作最终 LSTM 层的输入。 I drew a diagram here

以这种方式实现它允许在 3 个时间步后得到有效输出,并且不会破坏 tensorflows 反向传播算法(即 ANN 中的节点仍然可以训练。)

唯一棘手的事情是确保缓冲区对于最终 RNN 的顺序正确。

您有 3 个不同的输入:input_1, input_2, input_3 将其输入共享参数的 LSTM 模型。然后连接 3 个 lstm 的输出并将其传递给最终的 LSTM 层。代码应如下所示:

 # Create input placeholder for the network
 input_1 = tf.placeholder(...)
 input_2 = tf.placeholder(...)
 input_3 = tf.placeholder(...)

 # create a shared rnn layer 
 def shared_rnn(...):
    ...
    rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(...)

 # generate the outputs for each input
 with tf.variable_scope('lower_lstm') as scope:
    out_input_1 = shared_rnn(...)
    scope.reuse_variables() # the variables will be reused.
    out_input_2 = shared_rnn(...)
     scope.reuse_variables()
    out_input_3 = shared_rnn(...)

 # verify whether the variables are reused
 for v in tf.global_variables():
    print(v.name)

 # concat the three outputs
 output = tf.concat...  

 # Pass it to the final_lstm layer and out the logits
 logits = final_layer(output, ...)

 train_op = ...

 # train
   sess.run(train_op, feed_dict{input_1: in1, input_2: in2, input_3:in3, labels: ...}