TensorFlow 动态 RNN 未训练

Question

问题陈述

我正在尝试在 Linux RedHat 7.3 上的 TensorFlow v1.0.1 中训练动态 RNN（问题也出现在 Windows 7 上），无论我尝试什么，我都能得到准确的结果每个时期都有相同的训练和验证错误，即我的权重没有更新。

感谢您提供的任何帮助。

例子

我试图将其简化为显示我的问题的最小示例，但最小示例仍然非常大。我的网络结构主要基于 this gist.

网络定义

import functools
import numpy as np
import tensorflow as tf

def lazy_property(function):
    attribute = '_' + function.__name__

    @property
    @functools.wraps(function)
    def wrapper(self):
        if not hasattr(self, attribute):
            setattr(self, attribute, function(self))
        return getattr(self, attribute)
    return wrapper

class MyNetwork:
    """
    Class defining an RNN for labeling a time series.
    """

    def __init__(self, data, target, num_hidden=64):
        self.data = data
        self.target = target
        self._num_hidden = num_hidden
        self._num_steps = int(self.target.get_shape()[1])
        self._num_classes = int(self.target.get_shape()[2])
        self._weight_and_bias()  # create weight and bias tensors
        self.prediction
        self.error
        self.optimize

    @lazy_property
    def prediction(self):
        """Defines the recurrent neural network prediction scheme."""

        # Dynamic LSTM.
        network = tf.contrib.rnn.BasicLSTMCell(self._num_hidden)
        output, _ = tf.nn.dynamic_rnn(network, data, dtype=tf.float32)

        # Flatten and apply same weights to all time steps.
        output = tf.reshape(output, [-1, self._num_hidden])
        prediction = tf.nn.softmax(tf.matmul(output, self.weight) + self.bias)
        prediction = tf.reshape(prediction,
                                [-1, self._num_steps, self._num_classes])
        return prediction

    @lazy_property
    def cost(self):
        """Defines the cost function for the network."""

        cross_entropy = -tf.reduce_sum(self.target * tf.log(self.prediction),
                                       axis=[1, 2])
        cross_entropy = tf.reduce_mean(cross_entropy)
        return cross_entropy

    @lazy_property
    def optimize(self):
        """Defines the optimization scheme."""

        learning_rate = 0.003
        optimizer = tf.train.RMSPropOptimizer(learning_rate)
        return optimizer.minimize(self.cost)

    @lazy_property
    def error(self):
        """Defines a measure of prediction error."""

        mistakes = tf.not_equal(tf.argmax(self.target, 2),
                                tf.argmax(self.prediction, 2))
        return tf.reduce_mean(tf.cast(mistakes, tf.float32))

    def _weight_and_bias(self):
        """Returns appropriately sized weight and bias tensors for the output layer."""

        self.weight = tf.Variable(tf.truncated_normal(
                                         [self._num_hidden, self._num_classes],
                                         mean=0.0,
                                         stddev=0.01,
                                         dtype=tf.float32))
        self.bias = tf.Variable(tf.constant(0.1, shape=[self._num_classes]))

培训

这是我的训练过程。 all_data class 只保存我的数据和标签，并在我调用 all_data.train.next() 和 all_data.train_labels.next() 时使用批生成器 class 吐出批次进行训练。您可以使用任何您喜欢的批生成方案进行复制，如果您认为相关，我可以添加代码；我觉得这太长了。

tf.reset_default_graph()
data = tf.placeholder(tf.float32,
                      [None, all_data.num_steps, all_data.num_features])
target = tf.placeholder(tf.float32,
                        [None, all_data.num_steps, all_data.num_outputs])
model = MyNetwork(data, target, NUM_HIDDEN)
print('Training the model...')
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print('Initialized.')
    for epoch in range(3):
        print('Epoch {} |'.format(epoch), end='', flush=True)
        for step in range(all_data.train_size // BATCH_SIZE):

            # Generate the next training batch and train.
            d = all_data.train.next()
            t = all_data.train_labels.next()
            sess.run(model.optimize,
                     feed_dict={data: d, target: t})

            # Update the user periodically.
            if step % summary_frequency == 0:
                print('.', end='', flush=True)

        # Show training and validation error at the end of each epoch.
        print('|', flush=True)
        train_error = sess.run(model.error,
                               feed_dict={data: d, target: t})
        valid_error = sess.run(model.error,
                               feed_dict={
                                   data: all_data.valid,
                                   target: all_data.valid_labels
                                   })
        print('Training error: {}%'.format(100 * train_error))
        print('Validation error: {}%'.format(100 * valid_error))

    # Check testing error after everything.
    test_error = sess.run(model.error,
                          feed_dict={
                              data: all_data.test,
                              target: all_data.test_labels
                              })
    print('Testing error after {} epochs: {}%'.format(epoch + 1, 100 * test_error))

举个简单的例子，我生成了随机数据和标签，其中数据的形状为 [num_samples, num_steps, num_features]，每个样本都有一个与整个事物相关联的标签：

data = np.random.rand(5000, 1000, 2)
labels = np.random.randint(low=0, high=2, size=[5000])

然后我将我的标签转换为单热向量并将它们平铺，以便生成的 labels 张量与 data 张量的大小相同。

结果

无论我做什么，我都会得到这样的结果：

Training the model...
Initialized.
Epoch  0 |.......................................................|
Training error: 56.25%
Validation error: 53.39999794960022%
Epoch  1 |.......................................................|
Training error: 56.25%
Validation error: 53.39999794960022%
Epoch  2 |.......................................................|
Training error: 56.25%
Validation error: 53.39999794960022%
Testing error after 3 epochs: 49.000000953674316%

我在每个时期都有完全相同的错误。即使我的体重随机走动，这也应该改变。对于此处显示的示例，我使用了带有随机标签的随机数据，因此我预计不会有太大改进，但我确实希望有一些变化，而且我在每个时期都得到完全相同的结果。当我对我的实际数据集执行此操作时，我得到了相同的行为。

洞察力[=64=]

我犹豫要不要包含它，以防它被证明是转移注意力的东西，但我相信我的优化器正在计算 `None` 的成本函数梯度。当我尝试不同的优化器并尝试剪切渐变时，我继续使用 `tf.Print` 来输出渐变。网络崩溃，错误是 `tf.Print` 无法处理 None 类型的值。

尝试修复

我尝试了以下方法，但问题仍然存在：

使用不同的优化器，例如AdamOptimizer 对梯度进行修改和不进行修改（剪裁）。
正在调整批量大小。
使用更多和更少的隐藏节点。
运行更多纪元。
正在使用分配给 stddev 的不同值初始化我的权重。
将我的偏差初始化为零（使用 tf.zeros）和不同的常数。
使用在 prediction 方法中定义且不是 class 的成员变量的权重和偏差，以及定义为 [=28= 的 _weight_and_bias 方法] 就像 this gist.
在 prediction 函数中确定 logits 而不是 softmax 预测，即 predictions = tf.matmul(output, self.weights) + self.bias，然后使用 tf.nn.softmax_cross_entropy_with_logits。这需要一些重塑，因为该方法需要其标签和目标以形状 [batch_size, num_classes] 给出，因此 cost 方法变为：

（添加行以获取要格式化的代码...）

@lazy_property
def cost(self):
"""Defines the cost function for the network."""
    targs = tf.reshape(self.target, [-1, self._num_classes])
    logits = tf.reshape(self.predictions, [-1, self._num_classes])
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=targs, logits=logits)
    cross_entropy = tf.reduce_mean(cross_entropy)
    return cross_entropy

当我按照中的建议创建占位符时，更改我保留为 None 的大小维度，这需要在网络定义中进行一些重写。基本上设置 size = [all_data.batch_size, -1, all_data.num_features] 和 size = [all_data.batch_size, -1, all_data.num_classes].
在我的网络定义中使用 tf.contrib.rnn.DropoutWrapper 并传递一个 dropout 值，在训练中设置为 0.5，在验证和测试中设置为 1.0。

Answer 1

当我使用

时问题消失了

output = tf.contrib.layers.flatten(output)
logits = tf.contrib.layers.fully_connected(output, some_size, activation_fn=None)

而不是展平我的网络输出、定义权重并手动执行 tf.matmul(output, weight) + bias。然后我在我的成本函数中使用了 logits（而不是问题中的 predictions）

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=target,
                                                        logits=logits)

如果你想得到网络预测，你仍然需要做prediction = tf.nn.softmax(logits)。

我不知道为什么这有帮助，但在我进行这些更改之前，网络甚至不会使用随机编造的数据进行训练。

TensorFlow 动态 RNN 未训练

TensorFlow dynamic RNN not training

python

machine-learning

neural-network

tensorflow

recurrent-neural-network

问题陈述

例子

网络定义

培训

结果

尝试修复