GRU 和 RNN 实现之间的不一致

Inconsistency between GRU and RNN implementation

我正在尝试使用 Tensorflow 实现一些自定义 GRU 单元。我需要堆叠这些单元格,我想继承自 tensorflow.keras.layers.GRU. However, when looking at the source code, I noticed that you can only pass a units argument to the __init__ of GRU, while RNN 的参数是 RNNcell 的列表,并利用它来堆叠那些调用 StackedRNNCells 的单元格。同时,GRU 只创建一个 GRUCell.

对于我要实现的论文,我实际上需要堆叠GRUCell。为什么 RNNGRU 的实现不同?

在搜索这些 类 的文档以添加链接时,我注意到一些东西可能会让您感到困惑:有(目前,就在官方 TF 2.0 发布之前)两个 GRUCell TensorFlow 中的实现!有一个tf.nn.rnn_cell.GRUCell and a tf.keras.layers.GRUCell。看起来来自 tf.nn.rnn_cell 的那个已经被弃用了,Keras 那个是你应该使用的那个。

据我所知,GRUCelltf.keras.layers.LSTMCell and tf.keras.layers.SimpleRNNCell, and they all inherit from Layer. The RNN 文档具有相同的 __call__() 方法签名,文档对您使用的对象的 __call__() 方法提出了一些要求pass to its cell argument must do,但我的猜测是所有这三个都应该满足这些要求。您应该能够使用相同的 RNN 框架并将 GRUCell 对象列表传递给它而不是 LSTMCellSimpleRNNCell.

我现在无法对此进行测试,所以我不确定您是否将 GRUCell 个对象的列表或只是 GRU 个对象的列表传递给 RNN,但我认为其中之一应该有效。

train_graph = tf.Graph() 与 train_graph.as_default():

# Initialize input placeholders
input_text = tf.placeholder(tf.int32, [None, None], name='input')
targets = tf.placeholder(tf.int32, [None, None], name='targets')
lr = tf.placeholder(tf.float32, name='learning_rate')

# Calculate text attributes
vocab_size = len(int_to_vocab)
input_text_shape = tf.shape(input_text)

# Build the RNN cell
lstm = tf.contrib.rnn.BasicLSTMCell(num_units=rnn_size)
drop_cell = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
cell = tf.contrib.rnn.MultiRNNCell([drop_cell] * num_layers)

# Set the initial state
initial_state = cell.zero_state(input_text_shape[0], tf.float32)
initial_state = tf.identity(initial_state, name='initial_state')

# Create word embedding as input to RNN
embed = tf.contrib.layers.embed_sequence(input_text, vocab_size, embed_dim)

# Build RNN
outputs, final_state = tf.nn.dynamic_rnn(cell, embed, dtype=tf.float32)
final_state = tf.identity(final_state, name='final_state')

# Take RNN output and make logits
logits = tf.contrib.layers.fully_connected(outputs, vocab_size, activation_fn=None)

# Calculate the probability of generating each word
probs = tf.nn.softmax(logits, name='probs')

# Define loss function
cost = tf.contrib.seq2seq.sequence_loss(
    logits,
    targets,
    tf.ones([input_text_shape[0], input_text_shape[1]])
)

# 学习率优化器 优化器 = tf.train.AdamOptimizer(learning_rate)

# Gradient clipping to avoid exploding gradients
gradients = optimizer.compute_gradients(cost)
capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
train_op = optimizer.apply_gradients(capped_gradients)