我的第一个 LSTM RNN 损失没有按预期减少

My First LSTM RNN Loss Is Not Reducing As Expected

我一直在尝试查看 RNN 示例文档,并通过使用输出移动一个字符的小型莎士比亚语料库,将我自己的简单 RNN 用于序列到序列。我正在使用 sherjilozair 的神奇 utils.py 加载数据 (https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/utils.py) 但我的训练 运行 看起来像这样...

加载预处理文件 ('epoch ', 0, 'loss ', 930.27938270568848) ('epoch ', 1, 'loss ', 912.94828796386719) ('epoch ', 2, 'loss ', 902.99976110458374) ('epoch ', 3, 'loss ', 902.90720677375793) ('epoch ', 4, 'loss ', 902.87029957771301) ('epoch ', 5, 'loss ', 902.84992623329163) ('epoch ', 6, 'loss ', 902.83739829063416) ('epoch ', 7, 'loss ', 902.82908940315247) ('epoch ', 8, 'loss ', 902.82331037521362) ('epoch ', 9, 'loss ', 902.81916546821594) ('epoch ', 10, 'loss ', 902.81605243682861) ('epoch ', 11, 'loss ', 902.81366014480591)

我原以为会有更急剧的下降,即使在 1000 个 epoch 之后它仍然大致相同。我认为我的代码有问题,但我看不出是什么。我把代码贴在下面,如果有人能快速浏览一下,看看有没有什么奇怪的地方,我将不胜感激,谢谢。

#
# rays second predictor
#
# take basic example and convert to rnn
#

from tensorflow.examples.tutorials.mnist import input_data

import sys
import argparse
import pdb
import tensorflow as tf

from utils import TextLoader

def main(_):
    # break

    # number of hidden units
    lstm_size = 24

    # embedding of dimensionality 15 should be ok for characters, 300 for words
    embedding_dimension_size = 15

    # load data and get vocab size
    num_steps = FLAGS.seq_length
    data_loader = TextLoader(FLAGS.data_dir, FLAGS.batch_size, FLAGS.seq_length)
    FLAGS.vocab_size = data_loader.vocab_size

    # placeholder for batches of characters
    input_characters = tf.placeholder(tf.int32, [FLAGS.batch_size, FLAGS.seq_length])
    target_characters = tf.placeholder(tf.int32, [FLAGS.batch_size, FLAGS.seq_length])

    # create cell
    lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size, state_is_tuple=True)

    # initialize with zeros
    initial_state = state = lstm.zero_state(FLAGS.batch_size, tf.float32)

    # use embedding to convert ints to float array
    embedding = tf.get_variable("embedding", [FLAGS.vocab_size, embedding_dimension_size])
    inputs = tf.nn.embedding_lookup(embedding, input_characters)

    # flatten back to 2-d because rnn cells only deal with 2d
    inputs = tf.contrib.layers.flatten(inputs)

    # get output and (final) state
    outputs, final_state = lstm(inputs, state)

    # create softmax layer to classify outputs into characters
    softmax_w = tf.get_variable("softmax_w", [lstm_size, FLAGS.vocab_size])
    softmax_b = tf.get_variable("softmax_b", [FLAGS.vocab_size])
    logits = tf.nn.softmax(tf.matmul(outputs, softmax_w) + softmax_b)
    probs = tf.nn.softmax(logits)

    # expected labels will be 1-hot representation of last character of target_characters
    last_characters = target_characters[:,-1]
    last_one_hot = tf.one_hot(last_characters, FLAGS.vocab_size)

    # calculate loss
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=last_one_hot, logits=logits)

    # calculate total loss as mean across all batches
    batch_loss = tf.reduce_mean(cross_entropy)

    # train using adam optimizer
    train_step = tf.train.AdagradOptimizer(0.3).minimize(batch_loss)

    # start session
    sess = tf.InteractiveSession()

    # initialize variables
    sess.run(tf.global_variables_initializer())

    # train!
    num_epochs = 1000
    # loop through epocs
    for e in range(num_epochs):
        # look through batches
        numpy_state = sess.run(initial_state)
        total_loss = 0.0
        data_loader.reset_batch_pointer()
        for i in range(data_loader.num_batches):
            this_batch = data_loader.next_batch()
                # Initialize the LSTM state from the previous iteration.
            numpy_state, current_loss, _ = sess.run([final_state, batch_loss, train_step], feed_dict={initial_state:numpy_state, input_characters:this_batch[0], target_characters:this_batch[1]})
            total_loss += current_loss
        # output total loss
        print("epoch ", e, "loss ", total_loss)

    # break into debug
    pdb.set_trace()

    # calculate accuracy using training set

if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument('--data_dir', type=str, default='data/tinyshakespeare',
                      help='Directory for storing input data')
  parser.add_argument('--batch_size', type=int, default=100,
                      help='minibatch size')
  parser.add_argument('--seq_length', type=int, default=50,
                      help='RNN sequence length')
  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

7 月 20 日更新。

感谢您的回复。我更新了它以使用动态 RNN 调用看起来像这样...

outputs, final_state = tf.nn.dynamic_rnn(initial_state=initial_state, cell=lstm, inputs=inputs, dtype=tf.float32)

这提出了一些有趣的问题...批处理似乎通过数据集一次选择 50 个字符的块然后向前移动 50 个字符来获得批处理中的下一个序列。如果这随后用于训练,并且您根据序列中预测的最终字符与最终字符+1 计算损失,那么每个序列中都有 49 个预测字符,损失永远不会被测试。这似乎有点奇怪。

此外,在测试输出时,我给它输入一个字符而不是 50,然后得到预测并将该单个字符输入回去。我应该在每一步都添加到该单个字符吗?所以第一个种子是 1 个字符,然后我添加预测的字符,所以下一个调用是 2 个字符的序列,等等。直到我的训练序列长度的最大值?或者,如果我以更新状态传回,这无关紧要吗?即,更新后的状态是否也代表所有前面的字符?

在另一点上,我发现我认为这是它没有减少的主要原因......我错误地调用了两次softmax......

logits = tf.nn.softmax(tf.matmul(final_output, softmax_w) + softmax_b)
probs = tf.nn.softmax(logits)

您的函数 lstm() 只是一个单元格,而不是一系列单元格。对于一个序列,您创建一个 lstms 的序列,然后将该序列作为输入传递。通过连接嵌入输入并通过单个单元格将不起作用,而是对序列使用 dynamic_rnn 方法。

并且 softmax 应用了两次,在 logitscross_entropy 中需要修复。